CS61A Summer 2012: Lab 5

Note: If you're working on your own machine, instead of the lab machines, then you'll need to first copy the following files into your current working directory on your class account, and then transfer these files to your laptop/desktop:

Tuples

Question 0. What will Python print? Check your solution using the Python interpreter.

Question 1. For each of the following tuples, give the correct expression to get 7.

Question 2. Write the reverse procedure which operates on tuples. For a description of its behavior, see the docstring given below:

IRlists and Recursion

Recall the implementation of IRlists from lecture:

empty_irlist = ()

def make_irlist(first, rest=empty_irlist):
    return (first, rest)

def irlist_first(r):
    return r[0]

def irlist_rest(r):
    return r[1]

Question 3. It would be convenient to have a procedure tuple_to_rlist, which takes a tuple and converts it to the equivalent irlist. Finish the implementation below. (Hint: An easy solution uses reverse from earlier in the lab).

def tuple_to_irlist(tup):
    """Takes an input tuple, tup, and returns the equivalent representation of
    the sequence using an rlist.
    
    Arguments:
    tup -- A sequence represented as a tuple.

    >>> tuple_to_irlist((1, 2, 3)) == make_irlist(1, make_irlist(2, make_irlist(3)))
    True
    """
    "*** Your code here. ***"

def tuple_to_irlist(tup):
    irlist = empty_irlist
    reversed = reverse(tup)
    for item in reversed:
        irlist = make_irlist(item, irlist)
    return irlist
    
Recursively:
def tuple_to_irlist(tup):
    if tup == ():
        return empty_irlist
    return make_irlist(tup[0], tuple_to_irlist(tup[1:]))

Question 4. Implement the len_irlist function. Write it using recursion, i.e. without loops.

def len_irlist(r):
    """Return the length of recursive list r.
    
    >>> len_irlist(empty_irlist)
    0
    >>> len_irlist(make_irlist(6, make_irlist(2, make_irlist(-12))))
    3
    """
    "*** Your code here. ***"

def len_irlist(r):
    if r == empty_irlist:
        return 0
    else:
        return 1 + len_irlist(irlist_rest(r))

Question 5. Now use a similar technique to write a recursive sum function for IRLists.

def sum_irlist(r):
    """Return the sum of items in the IRList r. Assume that r contains
    only numbers.
    
    >>> sum_irlist(empty_irlist)
    0
    >>> sum_irlist(make_irlist(-1, make_irlist(3, make_irlist(8))))
    10
    """

def sum_irlist(r):
    if r == empty_irlist:
        return 0
    else:
        return irlist_first(r) + sum_irlist(irlist_rest(r))

Question 6. With only a very small modification to the previous function, you can also create a str_concat_irlist function that concatenates strings in an IRList. Recall that you can concatenate strings using the '+' operator:

>>> 'hello' + 'goodbye'
'hellogoodbye'
>>> '' + 'oh!' + '' + 'darling!'
'oh!darling!'

def str_concat_irlist(r):
    """Returns a string that is the concatenation of the strings in the
    IRList r.
    
    >>> str_concat_irlist(empty_irlist)
    ''
    >>> str_concat_irlist(make_irlist('How ', make_irlist('do ', make_irlist('you ', make_irlist('sum ', make_irlist('strings?'))))))
    'How do you sum strings?'
    """
    "*** Your code here. ***"

def str_concat_irlist(r):
    if r == empty_irlist:
        return ''
    else:
        return irlist_first(r) + str_concat_irlist(irlist_rest(r))

Question 7. Now let's write an insert function that inserts an item at a specific index in the IRList. If the index is greater than the current length, you should insert the item at the end of the list. Hint: This will be much easier to implement using recursion, rather than using iteration!

Note: Since IRLists are immutable, we are not actually inserting the item into the original irlist. Instead, we are creating a copy of the original IRList, but with the provided item added at the specified index. The original IRList stays the same.

def insert_irlist(r, item, index):
    """ Returns an IRList matching r but with the given item inserted
    at the specified index.
    If the index is greater than the current length, the item is 
    appended to the end of the list.
    
    Arguments:
    r -- A recursive list.
    item -- the item to be inserted
    index -- the index at which to insert the item
    
    >>> r = make_irlist('I', make_irlist(' love', make_irlist(' recursion')))
    >>> str_concat_irlist(insert_irlist(r, ' using', 2))
    'I love using recursion'
    >>> str_concat_irlist(insert_irlist(r, '!', 100))
    'I love recursion!'
    """
    "*** Your code here. ***"

def insert_irlist(r, item, index):
    if r == empty_irlist:
        return make_irlist(item)
    elif index == 0:
        return make_irlist(item, r)
    else:
        return make_irlist(irlist_first(r), insert_irlist(irlist_rest(r), item, index-1))

Question 8. Let us check to make sure you are using the IRList abstract data type properly. Replace the initial implementation in your code with the one below:

empty_irlist = 'empty_irlist'

def make_irlist(first, rest = empty_irlist):
    return rest, first

def irlist_first(r):
    return r[1]

def irlist_rest(r):
    return r[0]

See what we did there? We simpy changed the order of how the list data is stored in the tuple. We also modified the way an empty IRList is represented by using a specific string instead of the NoneType. This is just one of the (literally) infinite ways we could have implemented IRlists.

You should now re-run our doctests with the new implementation. Does everything still work? If not, fix any code that breaks an abstraction barrier so that your functions will be agnostic to the underlying implementation of IRlists.

Question 9. Finally, since you're on a roll, see if you can implement a filter function for IRLists. Again, recursion is your friend!

def filter_irlist(predicate, r):
    """ Returns an IRList only containing elements in r that satisfy
    predicate.
    
    Arguments:
    predicate -- A function that takes a single argument and returns
                 True or False.
    r -- An IRList.

    >>> from math import sqrt
    >>> r = make_irlist(53, make_irlist(16, make_irlist(625, make_irlist(50, make_irlist(49)))))
    >>> sum_irlist(filter_irlist(lambda x : sqrt(x)%1==0, r))
    690
    >>> s = make_irlist('I', make_irlist(' love', make_irlist(' recursion', make_irlist('!'))))
    >>> str_concat_irlist(filter_irlist(lambda x: len(x) < 7, s))
    'I love!'
    """
    "*** Your code here. ***"

def filter_irlist(predicate, r):
    if r == empty_irlist:
        return r
    elif predicate(irlist_first(r)):
        return make_irlist(irlist_first(r), filter_irlist(predicate, irlist_rest(r)))
    else:
        return filter_irlist(predicate, irlist_rest(r))

Dictionaries

Now, let's talk about dictionaries. Dictionaries are simply an unordered set of key-value pairs. While a dictionary is also a built-in data structure in Python (as dict), for now we're going to use Immutable Dictionaries (IDict). To create an IDict, use the constructor function make_idict:

>>> webster = make_idict(('Hamilton', (21, 'hipster')), ('Eric T.', (20, 'engineer')))

The inner tuples denote the key-value pairs in your dictionary. Each key-value pair is separated by a comma, and for each pair, the key appears to the left of the comma and the pair appears to the right of the comma. You can retrieve values from your dictionary by using the idict_select selector:

>>> idict_select(webster, 'Hamilton')
(21, 'hipster')

>>> idict_select(webster, 'Eric T.')
(20, 'engineer')

To add a new key-value pair to an existing IDict, use the idict_insert function. You can also replace the value of an existing key using idict_insert:

>>> webster = idict_insert(webster, 'Hamilton', (42, 'hammie'))

>>> idict_select(webster, 'Hamilton')
(42, 'hammie')

>>> webster = idict_insert(webster, 'Eric K.', (23, 'initech'))

>>> idict_select(webster, 'Eric K.')
(23, 'initech')

Now that you know how IDicts work, we can move on to approximating the entire works of Shakespeare! We are going to use a bigram language model to iteratively construct a sentence: the word bigram refers to how the model considers two words at a time. First, we start with any word - we will use The as an example. Then we look through all of the texts of Shakespeare, and add every word that comes after the word The to a tuple. We then randomly choose a word from this tuple (say, cat) - this randomly chosen word is the second word in our constructed sentence. Finally, we repeat the process. Our algorithm will terminate when it chooses a period (.) as the randomly-chosen word, ending our constructed sentence.

The object that we will be looking things up in is called a successor table, which we wll implement as an IDict. The keys in this IDict are words, and the values are tuples of successors to those words.

A copy of the framework code is located in ~cs61a/lib/shakespeare_idict.py. You'll want to copy this file to your current working directory by doing:

    cp ~cs61a/lib/shakespeare_idict.py .

(Remember to add the period (".") to the end of the command - this tells cp to copy the shakespeare_idict.py file to the current directory).

Question 10. Here is an incomplete definition of the build_successors_table function. The input is a tuple of words, and the output is a successors table. (By default, the first word is a successor to '.') Complete the definition for build_successors_table:

Note: To 'add' on a new element to a tuple, do the following:

>>> mytuple = (1, 2, 3, 4)
>>> mytuple = mytuple + (5,)
>>> mytuple
(1, 2, 3, 4, 5)


>>> def build_successors_table(tokens):
    table = make_idict()
    prev = '.'
    for word in tokens:
        if idict_select(table, prev):
        "**FILL THIS IN**"


        else:
        "**FILL THIS IN**"


        prev = word
    return table

>>> text = 'The', 'cat', 'and', 'the', 'dog', 'both', 'ate', 'and', 'ran', 'outside', '.'
>>> table = build_successors_table(text)
>>> for wd in idict_keys(table):
...     print("Words that come after " + wd + ": " + str(idict_select(table, wd)))
...
Words that come after .: ('The',)
Words that come after The: ('cat',)
Words that come after cat: ('and',)
Words that come after the: ('dog',)
Words that come after dog: ('both',)
Words that come after both: ('ate',)
Words that come after ate: ('and',)
Words that come after and: ('the', 'ran')
Words that come after ran: ('outside',)
Words that come after outside: ('.',)

def build_successors_table(tokens):
    """Return a dictionary: keys are words; values are tuples of successors."""
    table = make_idict()
    prev = '.'
    for word in tokens:
        if idict_select(table, prev):
            words = idict_select(table, prev)
            table = idict_insert(table, prev, words + (word,))
        else:
            table = idict_insert(table, prev, (word,))
        prev = word
    return table

Finally, we will implement the sentence-constructing algorithm introduced earlier, using a bigram language model. Suppose we are given a starting word. We can look up this word in our table to find its list of successors, and then randomly select a word from this list to be the next word in our constructed sentence. Then, we just repeat until we reach some ending punctuation.

Note: To randomly select from a list, first make sure you import the Python random library with import random, and then use the expression random.choice(my_list). For instance:

>>> import random
>>> mytuple = ("there", "are", "places", "I", "remember")
>>> random.choice(mytuple)
places
>>> random.choice(mytuple)
there

Here is a definition of construct_sent to get you started.

Question 11. Complete the construct_sent definition.

>>> def construct_sent(word, table):
    import random
    result = ''
    while word not in ('.', '!', '?'): 
    "**FILL THIS IN**"


    return result + word

def construct_sent(word, table):
    """Prints a random sentence starting with word, sampling from table"""
    import random
    result = ' '
    while word not in ['.', '!', '?']:
        successor_wds = idict_select(table, word)
        next_wd = random.choice(successor_wds)
        result = result + ' ' + next_wd
        word = next_wd
    return result + word

Great! Now all that's left is to run our functions with some actual code. The following snippet will return a list containing the words in all of the works of Shakespeare:

>>> def shakespeare_tokens(path = 'shakespeare.txt', url = 'http://inst.eecs.berkeley.edu/~cs61a/fa11/shakespeare.txt'):
        """Return the words of Shakespeare's plays as a list"""
        import os
    from urllib.request import urlopen
        if os.path.exists(path):
                return open('shakespeare.txt', encoding='ascii').read().split()
        else:
                shakespeare = urlopen(url)
                return shakespeare.read().decode(encoding='ascii').split()

Warning: Do not try to print the return result of this function!

Next, we probably want an easy way to refer to our list of tokens and our successors table. Let us make the following assignments:

>>> tokens = shakespeare_tokens()

>>> table = build_successors_table(tokens)

Finally, let us define an easy-to-call utility function:

>>> def generate_sent(successors_table, startword='The'):
    return construct_sent(startword, successors_table)

Question 12. Finally, use the function generate_sent, which uses construct_sent, to generate a few Shakespeare sentences of your own! Try to come up with a few funny ones:

>>> generate_sent(table)
" The plebeians have done us must be news-cramm'd "

>>> generate_sent(table)
" The ravish'd thee , with the mercy of beauty "

>>> generate_sent(table)
" The bird of Tunis , or two white and plucker down with better ; that's God's sake "

CS61A Lab 5

Week 3, Summer 2012

Tuples, Immutable Recursive Lists, Immutable Dictionaries

Tuples

IRlists and Recursion

Dictionaries