Iterators

Iteration is a very simple concept, and yet it is a powerful one. It is a concept that I have become so accustomed to that I have on occasion taken it for granted. I had started to assume that it was something that everyone knew and understood. Recent events, however, have proved me wrong. The design and use of iterators are something that are not as simple as I had assumed they were.

The article that follows is a programming article. I will be mostly using the C++ language to put the theory into a practical context, but I also hope that the article will be accessible to those of you that are not familiar with programming.

Before I move on to iterators I need to go back a little bit. There is another, even more basic concept used by programmers, and that is the concept of the Object. Now, the idea of the object may seem obvious now, but for a long time programmers did all of their thinking in the solution domain. That is, they thought about programming mostly in terms of what computers did, and what computers do is mostly the manipulation of numbers. Numbers are moved around in the computers memory, and sometimes maths is done with them. Thinking about programming like that is quite hard. It takes a long time to write software that way, especially when that program is supposed to be modelling real world concepts, like customer accounts, or employees, or windows and buttons on a screen.

So along comes abstraction, and the idea of an object. This isn't even limited to object-oriented programming. It's just the idea that the programmer gets to think in terms of concepts other than the ones that the computer works with directly. It's simple stuff once you know it. With an abstraction you don't need to know how to do something every time you want to do it, you just need to know its name, and its inputs.

For instance, to draw a line you just need to know what colour, and where the end points are. You don't need to know how the video memory is laid out anymore, or how the Bresenham algorithm works, because someone, maybe you, has already done that work. Better yet, this program that you are writing doesn't have to be written all over again when you move on to the new wiz-bang super fast hardware - the only bit that has to be rewritten is that line draw, because the interface to the abstraction doesn't have to change.

And that is where iterators come in. Once you have 'objects' (even if those objects are just numbers) you can have containers of objects. Everybody is familiar with the idea of a list. Well the main thing you do with a list is you look at the things in the list. A list is a collection of objects, and there are lots and lots of different ways of storing lists, or collections in a computers memory. You can store objects in arrays - that is all one after each other in a single block of memory. Or you can store them in an linked list - where each object is stored along with the location in memory of the next object (this location in memory thing is also an object, it is usually called a pointer). You can store objects in all sorts of different ways, and each way has it's own pros and cons.

Now, as I said the main thing you want to do with a list is look at the things in the list. In fact to do pretty much anything useful with a list you have to look at the things in it. The problem is unless you realise that you can create an abstraction that separates the idea of moving around a list from how the list is stored in memory you have to write a different version of that code for every type of list you want to do it to.

That is what an iterator is. An iterator is like a bookmark. It represents a location in a collection of objects. There are basically two things you can do with a bookmark, you read (or write on!) the page it marks the location of, and you can move it to a different page.

The very simplest type of iterator in C++ is a Forward Iterator. A forward iterator lets you get the object at that location in the list, and you can move to the next location in the list. Actually the simplest iterator is the Trivial Iterator, but the simplest useful iterator is the forward iterator. Then there is Bidirectional Iterator which does everything a forward iterator does, plus (wait for it!) it also let's you go backwards in the list. Last, but certainly not least there is the Random Access Iterator that does all the things the forward and bidirectional iterators do, but also lets you move forward and backwards in jumps of multiple locations - often much more efficiently then just saying "go next" or "go back" a whole bunch of times.

It's also worth pointing out that there is also the concept of an Input Iterator, which is like a forward iterator, but read only, and an Output Iterator, which is like a forward iterator, but write only.

The uniform interface the iterators in C++ use is modelled on the syntax that pointers use. Moving forward is done using the increment operator: "++", moving backward is done using the decrement operator: "--", and reading and/or writing to the object at that location, is done with the dereference operators "*" and "->". C++ uses "begin" and "end" markers, so that forward iteration starts with "begin", and increments (++) until the iterator is equal to the "end" marker.

Typically iteration looks something like this c-style pseudocode:

for( iterator = begin; iterator!=end; ++iterator ){ 
    DoSomething( *iterator ); 
}

So if you had an array, in C or C++ this might written:

char alphabet[26+1] = "abcdefghijklmnopqrstuvwxyz"; 
char * begin = &alphabet[0]; 
char * end = begin+26; 
for( char* i = begin; i!=end; ++i ){ 
    DoSomething( *i ); 
}

If you wanted to iterate over the contents of an STL style container, you would use it's "begin" and "end" member functions. The interface that is used for iterators in Java is different - Java doesn't have pointers, so there is nothing after which to model iterators.

And that is really all there is to it. As I said at the start, a basically simple concept, but now one that I hope that you can see the importance of. Oh yes, at the start I also said "Recent events, however, have proved me wrong.", so I'll close by telling you what prompted me to dedicate so many words to trying to communicate these ideas.

The first incident that got me thinking about iterators again was when I discovered that a certain case tool that will remain nameless comes with a code generator that lets you create iterators - except that the iterators it generates use a different interface than the "standard" interface I described. First of all, they don't use the "begin" and "end" model, and second of all, for reasons that escape me, they have to be incremented once to get to the first object in the container. I'd like to go on the record as saying: This is not how C++ iterators should be written. As I have said in this article, iterators in C++ should work like pointers.

The second incident was today. I was working on an animation system, and needed to generate some data from the transitions that were set up for blending between sequences. The animation data comes from a well known game engine, and I discovered the iterator that they had provided for a collection of objects I had to gather the data from would only return the object that is is pointing to if you move the iterator to it's next position. That is, the two most fundamental things that an iterator is supposed to do had been combined into a single operation, and could not be done separately.

How could this happen? How could these two mistakes be made? Should I assume the programmers were stupid? No! Of course not. The problem was that I was assuming that just because I knew about the concept, that didn't mean I could assume that everybody did. After all, if no one tells you these things, how can you know?

Originally posted to kuro5hin on Mar 28, 2003, here: http://www.kuro5hin.org/story/2003/3/27/18291/7840.

If you enjoyed reading this, you may also enjoy reading: "Using typedefs to write maintainable code" and/or "An introduction to C++ Traits".