CS61A Lab 13: Py, Streams, Iterators, and Generators

Week 7A, 2012

Py

Last week we learned about Py in lecture. Today we're going to dig through the source code to try to understand how it works, and try to make some simple modifications to the language. Here we go!

The source code for the Py interpreter is in the ~cs61a/lib/python_modules/py/ directory. Run this command to copy it to a local directory (don't forget the period at the end):

cp -r ~cs61a/lib/python_modules/py/ .

For you first task, go ahead and try to find where the following items are located in the code. If you can, check the answers off with your TA. Note that if there are several locations where the functionality for a given item is defined, try to find them all.

  1. A built-in (primitive) function
  2. An if statement
  3. How to parse a lambda expression

Now that you've gotten used to how the Py interpreter is formatted, it's time to modify it! We're going to add basic Exception handling to the interpreter; nothing fancy like what Python has, but just simple exceptions. They should behave as follows:

        Py> def func():
        ...     x = 3
        ...     raise x
        ... 
        Py> func()
        ERROR: 3
    

The raise statement takes another expression as an argument, and throws a basic exception up the stack with the value of that argument. That exception will get caught by the REPL (read eval print loop) that runs the interpreter in Python. Some suggested steps to implementing this include:

  1. Adding a new keyword to the language
  2. Making sure the language can parse the raise statement correctly
  3. Adding a new statement type
  4. OPTIONAL: Adding a PyException class that is the subclass of any exception a program run in the Py interpreter can throw. If you don't want to do this, you can instead have your raise expression throw a BaseException

Note: If this is taking forever, and you get stuck and don't know what to do next, ask for help! If you want to move on, then at least be sure that you have a general idea of where you would define the new behaviors for the raise expression in the code, if you're having trouble coming up with the exact code to complete this problem.

Streams

This week, we'll be exploring methods of creating sequences of values incrementally, using three different constructs. We first start with streams which we saw today in lecture, and then we will introduce iterators and generators.

Streams are another way to represent a sequence of values and are often useful for representing "infinite" sequences. One basic implementation of a Stream is the following class definition:

class Stream(object):
    """A lazily computed recursive list."""
    def __init__(self, first, compute_rest, empty=False):
        self.first = first
        self._compute_rest = compute_rest
        self.empty = empty
        self._rest = None
        self._computed = False

    @property
    def rest(self):
        assert not self.empty, ’Empty streams have no rest.’
        if not self._computed:
            self._rest = self._compute_rest()
            self._computed = True
        return self._rest

    empty_stream = Stream(None, None, True)

A Stream looks much like an RList in that it has a first item and a "rest", a reference to a Stream of the remaining items. However, unlike an RList, a Stream is "lazy"[1]: it does not immediately compute all of the values in the sequence, instead generally waiting until we ask for an item before actually computing its value. (Furthermore, you might notice that Streams, as implemented here, involve something like memoization: once we've computed a value we remember it and reuse it later whenever asked for it again.)

The power of Streams is that we don't immediately figure out all of the values until we need them; as a result we can "store" an infinite sequence of items because we don't explicitly enumerate all the items in the computer. (Note that the definition used above allows instances of Stream to represent finite sequences, as well. In the exact same way as for infinite sequences, we can also "store" a finite, but very long, sequence of items without using a proportional amount of memory, because we needn't actually explicitly store all the items ahead of time; all that need be stored is a description of how to compute them as needed)

Let's review how to make a simple, abstractly infinite, integer stream:

def make_integer_stream(first=1):
    def compute_rest():
        return make_integer_stream(first+1)
    return Stream(first, compute_rest)

What we need to eventually do is return a Stream, where the first element has been computed, and the second argument is a function that will eventually compute the rest of the Stream. We can define this function recursively, since Streams are defined recursively: the rest of a Stream is a Stream itself (just like the rest of an RList is an RList).

In lecture, we saw how to define for Streams the analogues of various familiar list processing functions; for example, map_stream, filter_stream, and combine_streams. These can make working with Streams much more convenient and readable. Let us implement some more functions of this sort:

Question #0:

Define a function find which takes in as input a stream and a predicate, and returns the first element in the stream satisfying the predicate.

[What if there is no element in the stream satisfying the predicate? Could we add to the specification that this function must throw an exception in that case? Explain the difficulty implementing such a specification, and the conditions under which one can and cannot meet it.]

Question #1:

Define a function cycle which takes in as input an ordinary Python list and returns a stream which just repeats that list over and over (E.g., cycle(['a', 'b', 'c']) is the stream <'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', ...>). If the input list is empty, output the empty stream; otherwise, the output stream should be infinite.

Question #2:

Define a function partial_sums, which takes in a stream <a1, a2,a3, ...> and outputs the stream <a1, a1 + a2, a1 + a2 + a3, ...>. [If the input is a finite stream of length n, the output should be a finite stream of length n. If the input is an infinite stream, the output should also be an infinite stream].

Consider p = partial_sums(cycle([1])). What do you think the stream p looks like? How can you use the Python interpreter to check? [You may want to keep in mind some of the conveniences already defined in lecture]

Question #3:

Define a function interleave, which takes in two streams <a1, a2, a3, ...> and <b1, b2, b3, ...> and returns the new stream <a1, b1, a2, b2, a3, b3, ...>. If either of the inputs is finite, the output stream should be finite as well, terminating just at the point where it would be impossible to go on. If both of the inputs are infinite, the output stream should be infinite as well.

Question #4:

What do you think the sequence produced by combine_streams(lambda x, y: 4 * x - y, interleave(cycle([0], p), p) looks like? Assign this stream to oddly_tripled_p and use the Python interpreter to check. Can you find a different way to produce this same sequence? (There are many! Try finding a few, using the various stream operations we've developed in lecture and lab so far)

[Just for fun[2]: Define staggered_double(x) to be combine_streams(add, x, Stream(0, lambda:x)) and define twelve_p to be staggered_double(staggered_double (oddly_tripled_p)). What does twelve_p look like? The answer may surprise you with its cleanliness! This observation is used to fruitfully assign a (somewhat counterintuitive) value to the sum of p in suitable contexts in mathematics]

Question #5:

Recall that our Stream class remembers elements it has previously computed and does not recompute them. This can play in unexpected ways with computations whose values may change over time (so-called "impure" computations).

Suppose one wants to define a random infinite stream of numbers via the recursive definition "A random infinite stream consists of a first random number, followed by a remaining random infinite stream". Consider an attempt to implement this via the code:

from random import random
random_stream = Stream(random(), lambda: random_stream)

What it is unsatisfactory about this? How can one fix it?

Iterators

Recall that, in Python, an iterator is a kind of object that is generally used in for loops like so:

for item in sequence:
    # use item

Which is equivalent to the following loop:

iterator = iter(sequence)
while True:
    try:
        item = next(iterator)
    except StopIteration:
        break # leave the loop
    # use item

An iterator is the object returned from the __iter__ method of a sequence you'd like to iterate over. An iterator implements the __next__ method which, each time it is called, either will return the next item in the sequence or will raise a StopIteration exception to indicate that it has no more values to return.

Generators

In Python, generators are often used as an alternative way to create a sequence of values. A generator is created by calling a function that uses the yield statement at one or more points to produce a value, instead of using the ordinary return statement. Each time the generator is invoked after having produced a value with yield, it will pick up from the line where it last left off, rather than starting over from the beginning. An example is the following function, which defines a generator producing the sequence of positive integers:

def integers():
    x = 1
    while True:
        yield x
        x += 1

The presence of the line yield x tells Python that this function describes a generator. Furthermore, whenever the interpreter reaches that line, it will produce x as a value and stop running the generator; the generator will later resume from that point in the code if/when called again to ask for another value.

To actually use a generator, we can call the function defining it in a for loop in the same position where we might otherwise have used a sequence:

>>> for i in integers():
...     print(i)
... 
1
2
3
4
etc.

If we want a generator to finish producing values, we simply have the function exit. As an example we have the generator integers_to below:

def integers_to(n):
    current = 1
    while current <= n:
        yield current
        current += 1
    # The function stops here

Question #6:

Write a generator fib_yield which produces the elements of the Fibonacci sequence 1, 1, 2, 3, 5, 8, 13, ..., one by one, never stopping.

Lab is over. Here come the footnotes...

[1]: Hence my selection to write this lab. Laziness is my wheelhouse. -SR

[2]: My fun, not yours. -SR