Lab 5: Rooted Trees, Linked Lists, Dictionaries

Submission
Rooted Trees

Terminology
Implementation
Question 1: (optional)
Question 2

Linked Lists

Implementation
Question 3

Dictionaries

Question 4

Extra Questions
More Rooted Trees

Question 5

More Linked Lists

Question 6
Question 7

Dictionaries + Shakespeare

Question 8
Question 9
Putting it all together

Submission

This lab is due at 11:59pm on 10/08/2014.

Please start at the beginning of the lab and work your way through, working and talking over Python's behavior in the conceptual questions with your classmates nearby. These questions are designed to help you experiment with the concepts on the Python interpreter. They are good tests for your understanding.

When you are done with lab, submit Questions 2, 3, and 4 (provided in the starter file lab05.py) to receive credit for this lab. The rest (5, 6, 7, 8, 9) are extra problems that are considered extra practice - they can be found in the the lab05_extra.py file. It is recommended that you complete these problems on your own time.

By the end of this lab, you should have submitted the lab05 assignment using the command submit lab05. You can check if we received your submission by running glookup -t. Click here to see more complete submission instructions.

Rooted Trees

Trees are a way we have of representing a hierarchy of information. For example, a file directory can be thought of as a tree structure: there is a root folder that contains several other folders — home, bin, user, etc. — and within each of these there exists a similar hierarchy.

The name "tree" comes from the branching structure, like real trees in nature except that they're drawn with the root at the top and the leaves at the bottom.

Terminology

node: a point in the tree. In these pictures, each node includes a datum (value at each node).
root: the node at the top. Every tree has one root node.
children: the children of a node are the nodes directly beneath the node.
leaf: a node that has no children.
depth: the distance from a node to the root (in terms of the number of levels). The depth of the root is 0, and the depth of its children is 1.

Implementation

For this lab, we will be using trees according to the following specification.

A tree consists of a root and a list of children. Each of these children is itself a tree. A leaf is represented as a tree whose list of children is an empty list.

Our implementation of trees can be found in lab05.py, though since it is an ADT, the implementation is not important. The interface for our trees consists of the following functions:

Constructors:
- rooted(value, branches): Creates a rooted tree with the given value and list of branches.
- leaf(value): Creates a leaf with the given value (i.e. a rooted tree with no children).
Selectors
- root(tree): Returns the value at the root of the tree.
- branches(tree): Returns the children for the given tree.

Therefore the tree generated by

t = rooted(1, [leaf(2),
               rooted(3, [leaf(4), leaf(5)]),
               rooted(6, [leaf(7)])])

would look like this:

It may be easier to visualize this translation by formatting the code like this:

t = rooted(1,
            [leaf(2),
             rooted(3,
                [leaf(4),
                leaf(5)]),
             rooted(6,
                [leaf(7)])])

To extract the 3 from this tree, we would do this:

root(branches(t)[1])

Here's the function print_tree, which prints out a tree in a human-readable form. The exact form follows the pattern illustrated above, where the root is unindented, and each of its children is indented one level further.

def print_tree(t, indent=0):
    """Return a string representation of this tree in which
    each node is indented by two spaces times its depth from
    the root.

    >>> print_tree(t)
    1
      2
      3
        4
        5
      6
        7
    """
    print('  ' * indent + str(root(t)))
    for child in branches(t):
        print_tree(child, indent + 1)

Question 1: (optional)

Define the function countdown_tree so that it returns the tree below. Make sure to use our tree ADT!

The doctest below shows the print_tree representation.

def countdown_tree():
    """Return a tree that has the following structure. 

    >>> print_tree(countdown_tree())
    10
      9
        8
      7
        6
          5
    """
    "*** YOUR CODE HERE ***"
    return rooted(10, [rooted(9, [leaf(8)]),
                    rooted(7, [rooted(6, [leaf(5)])])])

Question 2

Define the function size_of_tree, which takes in a tree as an argument and returns the number of entries in the tree.

def size_of_tree(t):
    """Return the number of entries in the tree.

    >>> print_tree(t)
    1
      2
      3
        4
        5
      6
        7
    >>> size_of_tree(t)
    7
    """
    "*** YOUR CODE HERE ***"
    return 1 + sum([size_of_tree(t) for t in branches(t)])

# Alternate solution
def size_of_tree(t):
    children_sum = 0
    for child in branches(t):
        children_sum += size_of_tree(child)
    return 1 + children_sum

Linked Lists

Linked lists are recursive data structures that represent a sequence of elements.

A linked list is a pair that contains only two elements: the first element in the sequence and the rest of the sequence (which is a linked list).

Implementation

Constructors:
- link(value, rest): Construct a linked list with first element value and the next linked list rest.
- empty: The empty linked list.
Selectors
- first(s): Returns the first element in the given linked list s.
- rest(s): Returns the rest of the linked list s.
Other
- is_link(s): Returns True if s is a linked list.
- print_linked_list(s): Prints out the linked list s.
- 'empty' represents the empty list

If a linked list s is constructed from a first element a and a linked list b, then first(s) returns a, which is an element of the sequence. rest(s) returns b, which is a linked list.

Question 3

Write a function that takes in a linked list lst and a function term which is applied to each number in lst and returns the sum.

def sum_linked_list(lst, fn):
    """ Applies a function FN to each number in LST and returns the sum
    of the resulting values

    >>> square = lambda x: x*x
    >>> double = lambda y: 2*y
    >>> lst1 = link(1, link(2, link(3, link(4, empty))))    
    >>> sum_linked_list(lst1, square)
    30
    >>> lst2 = link(3, link(5, link(4, link(10, empty))))
    >>> sum_linked_list(lst2, double)
    44
    """
    "*** YOUR CODE HERE ***"
    if lst == empty:
        return 0
    return fn(first(lst)) + sum_linked_list(rest(lst), fn)

# Iterative Solution
def sum_linked_list(lst, fn):
    sum = 0
    while lst != empty:
        sum += fn(first(lst))
        lst = rest(lst)
    return sum

Dictionaries

Dictionaries are unordered sets of key-value pairs. To create a dictionary, use the following syntax:

>>> singers = { 'Iggy Azalea': 'Fancy', 'Beyonce': 'Flawless', 'Adam Levine': 'Maps'}

The curly braces denote the key-value pairs in your dictionary. Each key-value pair is separated by a comma. For each pair, the key appears to the left of the colon and the value appears to the right of the colon. You can retrieve values from your dictionary by 'indexing' using the key:

>>> singers['Beyonce']
'Flawless'
>>> singers['Iggy Azalea']
'Fancy'

You can modify an entry for an existing key in the dictionary using the following syntax. Adding a new key follows identical syntax!

>>> singers['Beyonce'] = 'Survivor'
>>> singers['Beyonce']
'Survivor'
>>> singers['Nicki Minaj'] = 'Anaconda' # new entry!
>>> singers['Nicki Minaj']
'Anaconda'

You can also check for membership of keys!

>>> 'Adam Levine' in singers
True

Question 4

Implement the function counter which takes in a string of words, and returns a dictionary where each key is a word in the message, and each value is the number of times that word is present in the original string.

def counter(message):
    """ Returns a dictionary of each word in message mapped 
    to the number of times it appears in the input string.
    >>> x = counter('to be or not to be')
    >>> x['to']
    2
    >>> x['be']
    2
    >>> x['not']
    1
    >>> y = counter('run forrest run')
    >>> y['run']
    2
    >>> y['forrest']
    1
    """
    word_list = message.split()
    "*** YOUR CODE HERE ***"
    result_dict = {}
    for word in word_list:
        if word in result_dict:
            result_dict[word] += 1
        else:
            result_dict[word] = 1
    return result_dict

Extra Questions

The following questions are for extra practice — they can be found in the the lab05_extra.py file. It is recommended that you complete these problems on your own time.

More Rooted Trees

Question 5

Define the function height, which takes in a tree as an argument and returns the depth of the deepest node in the tree. (This is also the height of the tree.)

def height(t):
    """Return the depth of the deepest node in the tree. 

    >>> height(leaf(1))
    0
    >>> height(rooted(1, [leaf(2), leaf(3)]))
    1
    >>> print_tree(t)
    1
      2
      3
        4
        5
      6
        7
    >>> height(t)
    2
    """
    "*** YOUR CODE HERE ***"
    if branches(t) == []:
        return 0
    deepest = 0
    for child in branches(t):
        deepest = max(deepest, height(child))
    return deepest + 1

Dictionaries + Shakespeare

We will use dictionaries to approximate the entire works of Shakespeare! We're going to use a bigram language model. Here's the idea: We start with some word — we'll use "The" as an example. Then we look through all of the texts of Shakespeare and for every instance of "The" we record the word that follows "The" and add it to a list, known as the successors of "The". Now suppose we've done this for every word Shakespeare has used, ever.

Let's go back to "The". Now, we randomly choose a word from this list, say "cat". Then we look up the successors of "cat" and randomly choose a word from that list, and we continue this process. This eventually will terminate in a period (".") and we will have generated a Shakespearean sentence!

The object that we'll be looking things up in is called a "successor table", although really it's just a dictionary. The keys in this dictionary are words, and the values are lists of successors to those words.

Question 8

Here's an incomplete definition of the build_successors_table function. The input is a list of words (corresponding to a Shakespearean text), and the output is a successors table. (By default, the first word is a successor to "."). See the example below:

def build_successors_table(tokens):
    """Return a dictionary: keys are words; values are lists of
    successors.

    >>> text = ['We', 'came', 'to', 'investigate', ',', 'catch', 'bad', 'guys', 'and', 'to', 'eat', 'pie', '.']
    >>> table = build_successors_table(text)
    >>> sorted(table)
    [',', '.', 'We', 'and', 'bad', 'came', 'catch', 'eat', 'guys', 'investigate', 'pie', 'to']
    >>> table['to']
    ['investigate', 'eat']
    >>> table['pie']
    ['.']
    >>> table['.']
    ['We']
    """
    table = {}
    prev = '.'
    for word in tokens:
        if prev not in table:
            "*** YOUR CODE HERE ***"
            table[prev] = []
        "*** YOUR CODE HERE ***"
        table[prev].append(word)        prev = word
    return table

Question 9

Let's generate some sentences! Suppose we're given a starting word. We can look up this word in our table to find its list of successors, and then randomly select a word from this list to be the next word in the sentence. Then we just repeat until we reach some ending punctuation.

Hint: to randomly select from a list, first make sure you import the Python random library with import random and then use the expression random.choice(my_list))

This might not be a bad time to play around with adding strings together as well. Let's fill in the construct_sent function!

def construct_sent(word, table):
    """Prints a random sentence starting with word, sampling from
    table.
    """
    import random
    result = ' '
    while word not in ['.', '!', '?']:
        "*** YOUR CODE HERE ***"
        result += word + ' '
        word = random.choice(table[word])
    return result + word

Putting it all together

Great! Now all that's left is to run our functions with some actual code. The following snippet included in the skeleton code will return a list containing the words in all of the works of Shakespeare.

Warning: do NOT try to print the return result of this function.

def shakespeare_tokens(path='shakespeare.txt', url='http://goo.gl/SztLfX'):
    """Return the words of Shakespeare's plays as a list."""
    import os
    from urllib.request import urlopen
    if os.path.exists(path):
        return open('shakespeare.txt', encoding='ascii').read().split()
    else:
        shakespeare = urlopen(url)
        return shakespeare.read().decode(encoding='ascii').split()

Next, we probably want an easy way to refer to our list of tokens and our successors table. Let's make the following assignments:

tokens = shakespeare_tokens()
table = build_successors_table(tokens)

Finally, let's define an easy to call utility function:

>>> def sent():
...     return construct_sent('The', table)
>>> sent()
" The plebeians have done us must be news-cramm'd "

>>> sent()
" The ravish'd thee , with the mercy of beauty "

>>> sent()
" The bird of Tunis , or two white and plucker down with better ; that's God's sake "

Notice that all the sentences start with the word "The". With a few modications, we can make our sentences start with a random word. The following random_sent function (defined in your starter file) will do the trick:

def random_sent():
    import random
    return construct_sent(random.choice(table['.']), table)

Go ahead and load your file into Python (be sure to use the -i flag). You can now call the random_sent function to generate random Shakespearean sentences!

>>> random_sent()
' Long live by thy name , then , Dost thou more angel , good Master Deep-vow , And tak'st more ado but following her , my sight Of speaking false !'

>>> random_sent()
' Yes , why blame him , as is as I shall find a case , That plays at the public weal or the ghost .'

Lab 5: Rooted Trees, Linked Lists, Dictionaries

Table of Contents

Submission

Rooted Trees

Terminology

Implementation

Question 1: (optional)

Question 2

Linked Lists

Implementation

Question 3

Dictionaries

Question 4

Extra Questions

More Rooted Trees

Question 5

More Linked Lists

Question 6

Question 7

Dictionaries + Shakespeare

Question 8

Question 9

Putting it all together