Project 2b: Basic web programming

Dan Garcia, 2006-02-20
Efficiency is intelligent laziness.
-- Anonymous

Pay no attention to that man behind the curtain.
-- Wizard of Oz

Background

Figure 1. The Turkish Chess Player, 1783. Public Domain.

In 1769 a wonderful hoax was concocted -- a mechanical robot called The Turk that could play chess at the level of a master toured Europe and fooled almost everyone...it even beat Napoleon in a game! What made it so wonderful was that it was constructed like a magician's box to look like it was an actual nuts-and-bolts machine, but in fact hid a human master beneath the table. The deception and "fake engineering" was the work of brilliance. The hoax exploited one of the central tenets of Computer Science and Engineering: that the implementation details are hidden behind the user interface by a curtain of abstraction.

The Challenge

By now, you've probably implemented the Python-powered unit converter from Project 2a and wondered why we couldn't just ask Google to do the conversion for us. Well, that's a good question. They did allow it in the fall of 2005. In the spring of 2006 they updated their Terms of Service to forbid offline searches. That's just as well, we've got something ever better set up.

In this project, you'll write a Python program to do web fetching -- take input from a user and use it to query a web site, then take the resulting web page that is "returned", and format it nicely back to the user. The act of calling the web will be hidden behind the abstraction layer, so your program will seem as if it has the resources of the entire world (wide web) encoded within it. It's a pretty nifty trick.

All of the interaction with the user should be in plaintext (i.e., the user should see no HTML tags). Remember, the principal idea is to create the illusion, in a friendly text-only way, that your python program is extremely smart. Think of your Python-powered unit converter. Wouldn't it be neat if it were simply a shell to a website that did all the work? That's the point of this project and of the quotes above.

Sample online data collections

Here are some examples of online information you might want to query based on user input. Don't choose unit conversion because we want you to have exposure to other types of data.

Try to think about information you'd actually care about yourself! What web pages do you regularly visit? Sometimes programs are written for others, sometimes they're written as an exercise to gain language fluency, and sometimes they're written for your own personal use. We're giving you the flexibility to choose the information you wish and whose retrieval you would like to automate.

Example

Here's a sample interaction of a program that tells you the weather for cities in the United States. Note that your program shouldn't look like ours, because you will choose a data service that is of interest to you. We'll say it again -- do not choose weather as your data collection -- find something else that interests you. It only needs to adhere to the final checklist below.

computer% python project2b.py
Welcome to WeatherGuru v1.0 by Dan Garcia!
Tell me a city and state and I'll tell you its weather!

City, State or (q)uit]: berkeley, ca
Temperature : 50.5 F / 10.3 C
Forecast    : Clear

City, State or (q)uit]: San Francisco, CA
Temperature : 52.5 F / 11.4 C
Forecast    : Clear

City, State or (q)uit]: barrow, alaska
Temperature : -6 F / -21 C
Forecast    : Light Snow Showers Blowing Snow Showers

City, State or (q)uit]: Mount Doom, Mordor
I can't seem to locate that city, please try again

City, State or (q)uit]: q
Thanks for visiting us. May the sun shine on your shoulders.
computer%

Hints

Here are some general tips to get you rolling.

Automating a request for a web resource -- the urllib module

As documented on www.python.org, the urllib module can open arbitrary resources by Uniform Resource Locator, or URL. At this point, you've probably typed in about a thousand or more URLs into the search field of your browser. http://www.python.org/ is one example, but URLs don't have to start with http://! They can also start with https, ftp, mailto, file, and several others. Here's a three-line code snippet to open a URL, read all of the contents into a string (we'll call html) and close it:

url = urllib.urlopen("http://www.python.org/")
html = url.read()
## Do something with html, parse it and format it in a pretty way
url.close()

It's pretty easy. You'll get an ugly error message if the URL doesn't exist -- if you'd like to be able to handle this in a clean way, feel free to read ahead about exceptions. The problem with this example, however, is that it doesn't take any input from the user! Here's how that's done:

How to pass information to URLs

Let's say you would like to be able to search a dictionary for the word python. One common technique is to pass the query at the end of the URL:

http://www.hostname.com/path?query

A more general technique is to pass parameter-value pairs (just like key-values in dictionaries!), as such:

http://www.hostname.com/path?parameter1=value1&parameter1=value2

...but note that no spaces are allowed in either the values or the parameters. If you need to specify one with a space (e.g., "San Francisco"), you need to replace the spaces with three characters that represent a space in hexadecimal ASCII: %20. So, back to our python query. Here's how we pass it to dictionary.com:

http://dictionary.reference.com/search?q=python

which returns the html from the search page on the word python. If the receiver is a browser, it knows how to render this HTML into a pretty page. If it's your python program and the goal is to produce nice, clean, text that is free of HTML tags, you've got to parse it!

How to parse the html string returned by url.read()

String parsing is one of Python's strengths! As you know, you can find out what the string module supports by asking the on-line help:

import string ## This only needs to be done once per program
help(string)

Some of the most userful string functions are split, find, replace and slicing.

Last-minute thoughts

Testing

Test each function in your program with enough different inputs to show that it works. When testing a function, print out the input, the expected result, and the actual result.

Extra for Experts

There are a couple of cool features you can add to make your program sing:

Checklist

These are the requirements to meet for project completion.