Lab 5: Collections, Iterators, and Iterables

Before You Start

First, try your hand at this short, mandatory quiz on polymorphism.

A. Introduction

In this lab we will be giving you a small taste of the Java Standard Library. In particular, we will be looking at Collections, Iterators, and Iterables. Collections encompass many of the data structures that you will be working with this semester. Iterators are objects that control iteration through items in a collection via three methods: a constructor, hasNext (are there more items yet to be iterated over?), and next (return the next item in the iteration). Iterators are provided for every collection class in java.util. More generally, they provide a mechanism for unifying the operations of processing an array in several formats, handling input, and traversing more complicated structures.

By the end of this lab, you should be comfortable with searching through the Java Standard Library documentation and with using Iterators and Iterables and Collections.

B. Collections

Collection classes represent collections of data. Most of the data structures we will study in the rest of this class are used to represent collections. At the most general level, pretty much anything you use to store multiple items at once is going to fufill the requirements to be a collection. The most commonly used collections are sets and lists, but there are many others. The hierarchy of all collection classes is organized using interfaces! As you will see soon, what that means is that sets and lists in Java are represented as interfaces that extend the Collection interface.

The Collection Interface

At the root of the organization is the Collection interface. This interface specifies methods like isEmpty(), add(Object o), and many others. All collection classes implement Collection in one way or another. One important type of collection is the List, which you have seen before. Implementations of List include the ArrayList and the LinkedList. Another important collection is the Set, which you will implement shortly.

All of the collections above are found in the package java.util. A package is a set of (supposedly related) classes and subpackages. As elsewhere in Java, the notation java.util.ArrayList means "The class named ArrayList in the (sub)package named util in the package named java". To make your life easier, you'll want to import them before you can use them as follows:

import java.util.ArrayList;
import java.util.LinkedList;

(otherwise, you will have to write java.util.ArrayList everywhere instead of just ArrayList).

The Set Interface

SetMarbles

A set is a group of items with no duplicates. The Set interface does not require ordering (e.g. HashSet), but sets can be ordered (TreeSet, ConcurrentSkipListSet). Sets support (at least) the following operations:

1. Initialization (with a constructor)
2. Adding an element to the set if it's not already there
3. Removing an element from the set
4. Checking whether a given item is in the set
5. Checking whether the set is empty

While a class implementing the Set interface must be able to support the above operations, the actual implementation and manipulation of the data is up to the class implementing the Set interface.

Exercise: how could you represent a set of nonnegative integers? (hint: use a boolean array!)

It's important to realize that a Set, although a subtype of Collection interface, is itself an interface. Yes, an interface can extend another interface. It is defined like this:

    public interface Set extends Collection {
        ...
    }

IMPORTANT: Note that interfaces extend other interfaces, while classes implement interfaces.

Set is an interface because there are many types of sets that are implemented in a variety of ways. All of these Set subtypes fulfill the requirement of being a Set.

The List Interface

SequenceBunnies

Lists differ from sets in the way that elements have positions within the List, duplicates are allowed, and all lists are ordered. Thus, they support the following operations:

1. Initialization (with a constructor)
2. Adding an element at a given position or at the end of the list
3. Removing an element at a given position or at the end of the list
4. Checking whether a given item is in the list
5. Identifying the value at a given position in the list
6. Checking whether the list is empty

Implementation of a list normally involves storing the elements of the list explicitly. One might use an array whose 0th element is the first list item, and so forth. You have implemented Lists through the IntList and IntDList classes. Similar to Set interface, List is an interface that might be defined like this:

    public interface List extends Collection {
        ...
    }

Collections usage example

Let's take a closer look at the declaration, instantiation, and utilization of Collections, which is a bit syntactically different. For example, to instantiate a list of ints, add the integer 5, then get it back, we might write:

List<Integer> s;
s = new ArrayList<Integer>();
s.add(5);
int x = s.get(0); /* gets the five back */

There are three funny things about the syntax above:

Note that we declared s to be a variable that points to a List object, however the actual object we created was specifically an ArrayList. Recall that List refers to an interface, and so it is not actually possible to create a List object. Instead, we had to choose a concrete class that actually implemented List, in this example the Java ArrayList class. Recall that a concrete class is a non-abstract class that implements all the methods it potentially inherits from any abstract superclasses or interfaces.

Since the LinkedList class is supposed to represent essentially the same abstraction (a numbered list of items), it has almost the same API (Application Programming Interface) as ArrayList. For our purposes today, that means it supports almost the same methods. This makes it easy to change back and forth between an ArrayList and a LinkedList.

For our toy example, since LinkedList also has an add and get method, we could easily change our code to read:

List<Integer> s;
s = new LinkedList<Integer>();
s.add(5);
int x = s.get(0); /* gets the five back */

That is, only the instantiation of s changes.

It is nice that the Java designers gave us many implementations of the same collection, since each of these implementations has its own strengths and weaknesses.

C. Introduction to Iterators

An iterator is an object whose purpose is to traverse the values in a collection of objects (here we mean the abstract notion of a collection, not necessarily a capital C Collection as defined in Java), yielding them one at a time to the iterator's client. The standard Java library defines a standard interface that iterators can implement:

package java.util;

public interface Iterator<Value> {
    /** Returns true iff this iterator has another value to deliver. */
    boolean hasNext();

    /** Returns the next value from this iterator, and advances to
     *  the next.
     *  Precondition: hasNext() must be true.
     */
    Value next();

    /** Removes the last value delivered by next() from the
     *  underlying collection of values being iterated over.
     *  Optional method.  May be called at most once per call to
     *  to next().
     */
    default void remove();
}

While in principle a collection could itself implement the iterator interface, this would result in very strange code. This is because the collection itself is not the iterator! Instead, collections that wish to support iteration typically provide an iterator() method that returns an appropriate Iterator (this is a part of the Iterable interface that's described more in detail below). Consequently, the collection will now have an iterator that is accessible by calling iterator().

For example, if L is a List<String>, you can write

for (Iterator<String> i = L.iterator(); i.hasNext();) {
    String value = i.next();
    System.out.print(value + " ");
}

This code would print out each item in the list, one at a time. An alternate way of writing this code is as follows:

Iterator<String> i = L.iterator();
while (i.hasNext()) {
    String value = i.next();
    System.out.print(value + " ");
}

In the code above, the List class provides an iterator method that returns an object of type Iterator<String>. How do you define an Iterator exactly? In short, implement the Iterator interface defined above. For the object you wish to define as an iterator, declare implements Iterator in its class definition, and provide a next(), a hasNext(), and (optionally) a remove() method. Let's take a look at the documentation for the Iterator interface here and note some key details about the two required methods and the remove() method.

Finally, let's see this in action. Java abstracts types such as LinkedList and ArrayList using the List interface. You can view the source code of the List interface here if interested!

Java then provides an AbstractList class that provides default implementations for many List methods. Looking at the source of AbstractList.iterator(), we see that the iterator() method here returns a new object of type Itr, where Itr is a nested non-static class (nested non-static classes are called inner classes).

Finally, take a look at the source code for the Itr inner class. This will show you exactly how the iterator for things like LinkedList and ArrayList works. Don't worry too much about the exact implementation details, do observe that it implements the Iterator interface and thus has a next() and hasNext() method as required by the interface! It also overrides the remove() method, though this is not necessary because remove() is a default method. If you want to dig even deeper you can consider how Itr keeps track of the current element that calling next() will return.

D. The Iterable interface

Iterating through interfaces using next and hasNext would be tedious to write every time, and thus Java has introduced a special syntax for this Java idiom known as the Iterable interface. If L has a type that implements Iterable<String> (as List<String> does), then the loops above may be written

for (String value: L) {
   System.out.print(value + " ");
}

This is called the enhanced for loop. It is very similar to the for loops that you saw in Python.

The Iterable interface is really simple. A class that implements it must simply provide an iterator method that returns an Iterator.

package java.lang;
public interface Iterable<Value> {
    /** Returns an iterator over my values. */
    Iterator<Value> iterator();
}

An Iterator for FixedSizeList

Here's another example of how Iterable and Iterator are used in practice. Suppose we had a class FixedSizeList that represents a list that stores its values in an integer array. Remember that we ask the FixedSizeList for an Iterator instance through the iterator() method. This then returns a FixedSizeListIterator, which is an inner class we've defined to iterate through our FixedSizeList.

import java.util.Iterator;

public class FixedSizeList implements List<Integer>, Iterable<Integer> {

    // instance variables
    protected int[] values;   // list elements
    int count;                // number of array cells used by list

    // private Iterator class
    private class FixedSizeListIterator implements Iterator<Integer> {

        int nextIndexToReturn = 0;    // index of next item to return by iterator

        public Integer next() {
            int valToReturn = values[nextIndexToReturn];
            nextIndexToReturn++;
            return valToReturn;
        }

        public boolean hasNext() {
            return nextIndexToReturn < count;
        }
    }

    public Iterator<Integer> iterator() {
        return new FixedSizedListIterator();
    }

        ...
}

We could build a FixedSizeList as follows.

public static void main(String [] args) {
    FixedSizeList list = new FixedSizeList(10);
    list.add(5);
    list.add(3);
    // list now contains the integers 5, 3;
    // thus values[0] is 5, values[1] is 3,
    // and count is 2.
        ...
}

Then we can use an iterator.

Iterator<Integer> iter = list.iterator();
int m;
m = iter.next();
// m now contains 5 and nextIndexToReturn contains 1
m = iter.next();
// m now contains 3 and nextIndexToReturn contains 2

At this point, another call to next would be invalid since nextIndexToReturn is past the end of the list values.

The code maintains an important invariant: prior to any call to next, nextIndexToReturn contains the index of the next value in the list to return.

In Summary

An Iterator is an object that provides a next(), hasNext(), and (optionally) remove() method. Iterators are intended to provide a way to step through some sort of data structure one item at a time in some fashion. If you have access to an Iterator to some collection, you can use the next() and hasNext() methods to go through it step by step.

Using next() and hasNext() is tedious, so if a class declares itself as implementing Iterable, you can use the : operator with for loop to iterate instead. To follow the Iterable contract, the class that implements Iterable must provide a method that returns an Iterator that allows iteration through that object.

Food for thought: Why doesn't the List class just implement Iterator itself?

E. Programming Task: EveryOtherWord.java

Fill in the everyOtherWord method of EveryOtherWord.java such that it performs the task described in its comment and passes the test in main. Do not change anything in the file other than the everyOtherWord method, except that you may import the Iterator class (though this is not necessary). You shouldn't need any helper methods.

F. Programming Task: Meta-Iteration through Filters

As we saw in EveryOtherWord, we don't always want to iterate in exactly the way that the available iterator tells us. One approach is to write special purpose methods (e.g. everyOtherWord) that generate a new Iterable that contain all the items we require. This is fine as long as we have the memory and time to spend on building this iterable, which may potentially be quite large and slow to construct.

An alternate approach is to use an intermediary class that will feed us the items we want, one at a time. These are often referred to as filters.

Open up the Filter abstract class in the utils package. As the class implements Iterator and Iterable, it has the next, hasNext, and iterator methods. Recall that implementing Iterable indicates that Filter should be able to provide an iterator object over its items, and since Filter implements Iterator, it is an iterator, and so just returns itself!

You'll see that Filter acts as a middleman between client code and some Iterator. Specifically, if you look at the hasNext() method, you'll see that it keeps calling next() on the underlying Iterator until the keep() method returns true.

Concrete implementations of the Filter class will implement a keep() method that results in the desired behavior. For example, a filter designed to only return every other item would return true for every other call to keep.

Complete the implementations of TrivialFilter, AlternatingFilter, MonotonicFilter and PredicateFilter so that they pass the TestFilters test. You might find the FilterClient class to be useful for manually testing your implementations.

Note that PredicateFilter is a kind of filter that takes in a Predicate object (as defined in the utils package) as the first argument in its constructor.

G. Submission

Make sure to have completed the following

Attempting the quiz is mandatory, so make sure you do! The files can be submitted via the usual commands for committing, tagging, and pushing. Remember that you have a test coming up on (10/3). We suggest starting to take a look at past exams, which can be found under the resources tab of the course website.

H. Optional Assignments

As two optional but highly recommended exercises, there is also an everyFourth and evenNumberFilter method in FilterClient that you should feel free to implement.

These assignments provide extra practice with iterators and iterating.