Java source code analysis: In-depth discussion of the Iterator model

The package contains a series of important collection classes. This article will start from analyzing the source code and conduct in-depth research on the internal structure of a collection class, as well as the inside story of the source code implementation of iterative mode traversing the collection.

Next, we will briefly discuss a root interface Collection, then analyze an abstract class AbstractList and its corresponding Iterator interface, and carefully study the implementation principle of the iterative sub-mode.

The source code version discussed in this article is JDK 1.4.2. Because JDK 1.5 uses a lot of generic code, in order to simplify the problem, we will still discuss the code of version 1.4.

Collection of the root interface of the collection class

The Collection interface is the root type of all collection classes. One of its main interface methods is:

boolean add(Object c)

add()The method will add a new element。 Note that this method will return a boolean, but the return value does not indicate whether the addition is successful or not. If you read the doc carefully, you can see that the Collection stipulates that if a collection refuses to add this element, an exception must be thrown for any reason. The meaning of this return value is whether the content of the set has changed after the add() method is executed (that is, whether the number, position, etc. of the element has changed), which is implemented by the concrete class. That is: if the method errors, an exception will always be thrown; the return value only indicates whether the content of the Collection has changed after the method is executed.

Similar ones:

boolean addAll(Collection c);
boolean remove(Object o);
boolean removeAll(Collection c);
boolean remainAll(Collection c);

The Object[] toArray() method is very simple, converting the set into an array to return. The Object[] toArray(Object[] a) method is a bit complicated. First of all, the returned Object[] is still an array that turns all elements of the set into an array, but the type and parameter a are the same, such as execution:

String[] o = (String[])(new String[0]);

The actual type obtained is String[].

Secondly, if the size of parameter a cannot contain all elements of the collection, the returned new array will be. If the size of parameter a can contain all elements of the set, then a is still returned, but the content of a is filled with the elements of the set. It is particularly important to note that if a has more size than the number of set elements, all the parts behind a are set to null.

The last and most important method is iterator(), which returns an Iterator (iterator) that iterates through all elements of the collection.

Use Iterator mode to implement traversal collections

The Iterator pattern is a standard access method used to traverse collection classes. It abstracts access logic from different types of collection classes, thus avoiding exposing the internal structure of the collection to the client.

For example, if iterator is not used, the way to iterate over an array is to use an index:

for(int i=0; i
To access a linked list (LinkedList) you must use a while loop:

while((e=())!=null) { ... () ... }

Both of the above methods must know the internal structure of the collection in advance. The access code and the collection themselves are tightly coupled, and the access logic cannot be separated from the collection class and the client code. Each collection corresponds to a traversal method, and the client code cannot be reused.

What's even more terrifying is that if you need to replace ArrayList with LinkedList in the future, all the original client code must be rewritten.

To solve the above problems, the Iterator pattern always uses the same logic to traverse the collection:

for(Iterator it = (); (); ) { ... }

The secret is that the client itself does not maintain the "pointer" to traverse the collection. All internal states (such as the current element position, whether there is the next element) are maintained by the Iterator, which is generated by the collection class through factory methods, so it knows how to traverse the entire collection.

The client never deals directly with the collection class. It always controls the Iterator and sends it "forward", "backward", and "take the current element" commands to it, and it can indirectly traverse the entire collection.

First, let’s take a look at the definition of the interface:

public interface Iterator {
boolean hasNext();
Object next();
void remove();
}

Relying on the first two methods can complete the traversal. The typical code is as follows:

for(Iterator it = (); (); ) {
Object o = ();
// Operation on o...
}

In JDK1.5, the above code is syntax simplified:

// Type is a specific type, such as String.
for(Type t : c) {
// Operation on t...
}

The specific type of Iterator returned by each collection class may be different. Array may return ArrayIterator, Set may return SetIterator, Tree may return TreeIterator, but they all implement the Iterator interface. Therefore, the client does not care which Iterator is. It only needs to obtain this Iterator interface. This is the power of object-oriented.
Iterator source code analysis

Let's see how AbstractList creates Iterator. First, AbstractList defines an inner class:

private class Itr implements Iterator {
...
}

The definition of iterator() method is:

public Iterator iterator() {
return new Itr();
}

So the client doesn't know the true type of Iterator it gets through Iterator it = ();

Now we are concerned about how this Itr class declared private implements traversing the AbstractList:

private class Itr implements Iterator {
int cursor = 0;
int lastRet = -1;
int expectedModCount = modCount;
}

The Itr class relies on 3 int variables (and an implicit reference to AbstractList) to implement traversal. Cursor is the location of the element when the next next() call is called. The first call to next() will return the element with index 0. lastRet records the last cursor location, so it is always 1 less than cursor.

The number of elements of the variable cursor and the set determines hasNext():

public boolean hasNext() {
return cursor != size();
}

The method next() returns an element indexed as cursor, and then modify the values of cursor and lastRet:

public Object next() {
checkForComodification();
try {
Object next = get(cursor);
lastRet = cursor++;
return next;
} catch(IndexOutOfBoundsException e) {
checkForComodification();
throw new NoSuchElementException();
}
}

expectedModCount represents the expected modCount value, used to determine whether the set has been modified during the traversal process. AbstractList contains a modCount variable whose initial value is 0. When the set is modified once (the method of add, remove, etc. is called), modCount is added 1. Therefore, if modCount remains unchanged, it means that the content of the collection has not been modified.

When it is initialized, the modCount variable of the collection is recorded with expectedModCount, and then it will detect the value of modCount where necessary:

final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}

If the modCount is not equal to the value recorded in expectedModeCount at the beginning, it means that the content of the collection has been modified, and a ConcurrentModificationException will be thrown.

This ConcurrentModificationException is a RuntimeException, don't catch it on the client. If this exception occurs, it means there is a problem with the program code writing, and you should check the code carefully instead of ignoring it in the catch.

However, it is completely no problem to call Iterator's own remove() method to delete the current element, because the values of expectedModCount and modCount will be automatically synchronized in this method:

public void remove() {
...
(lastRet);
...
// After calling the remove() method of the collection, reset the expectedModCount:
expectedModCount = modCount;
...
}

To ensure that the traversal process is completed smoothly, it is necessary to ensure that the content of the collection is not changed during the traversal process (except the remove() method of Iterator). Therefore, the principle of ensuring traversal is to use this collection only in one thread, or synchronize the traversal code in multiple threads.

Finally, let's give a complete example:

Collection c = new ArrayList();
("abc");
("xyz");
for(Iterator it = (); (); ) {
String s = (String)();
(s);
}

If you replace the ArrayList of the first line of code with LinkedList or Vector, the remaining code can be compiled without changing one line, and the function remains unchanged. This is the principle of abstract programming: the dependence on concrete classes is minimal.