Efficiency is intelligent laziness.
-- AnonymousPay no attention to that man behind the curtain.
-- Wizard of Oz
Background
In 1769 a wonderful hoax was concocted -- a mechanical robot called The Turk that could play chess at the level of a master toured Europe and fooled almost everyone...it even beat Napoleon in a game! What made it so wonderful was that it was constructed like a magician's box to look like it was an actual nuts-and-bolts machine, but in fact hid a human master beneath the table. The deception and "fake engineering" was the work of brilliance. The hoax exploited one of the central tenets of Computer Science and Engineering: that the implementation details are hidden behind the user interface by a curtain of abstraction.
The Challenge
By now, you've probably implemented the Python-powered unit converter from Project 2a and wondered why we couldn't just ask Google to do the conversion for us. Well, that's a good question. They did allow it in the fall of 2005. In the spring of 2006 they updated their Terms of Service to forbid offline searches. That's just as well, we've got something ever better set up.
In this project, you'll write a Python program to do web fetching -- take input from a user and use it to query a web site, then take the resulting web page that is "returned", and format it nicely back to the user. The act of calling the web will be hidden behind the abstraction layer, so your program will seem as if it has the resources of the entire world (wide web) encoded within it. It's a pretty nifty trick.
All of the interaction with the user should be in plaintext (i.e., the user should see no HTML tags). Remember, the principal idea is to create the illusion, in a friendly text-only way, that your python program is extremely smart. Think of your Python-powered unit converter. Wouldn't it be neat if it were simply a shell to a website that did all the work? That's the point of this project and of the quotes above.
Sample online data collections
Here are some examples of online information you might want to query based on user input. Don't choose unit conversion because we want you to have exposure to other types of data.
- Dictionaries, Thesauruses, Encyclopedias
- Price of a certain stock
- Anagrams
- Number of fish visible on an online fishbowl webcam
- Traffic information for a certain road
- Your idea here! (just not weather, see below)
Try to think about information you'd actually care about yourself! What web pages do you regularly visit? Sometimes programs are written for others, sometimes they're written as an exercise to gain language fluency, and sometimes they're written for your own personal use. We're giving you the flexibility to choose the information you wish and whose retrieval you would like to automate.
Example
Here's a sample interaction of a program that tells you the weather for cities in the United States. Note that your program shouldn't look like ours, because you will choose a data service that is of interest to you. We'll say it again -- do not choose weather as your data collection -- find something else that interests you. It only needs to adhere to the final checklist below.
computer% python project2b.py Welcome to WeatherGuru v1.0 by Dan Garcia! Tell me a city and state and I'll tell you its weather! City, State or (q)uit]: berkeley, ca Temperature : 50.5 F / 10.3 C Forecast : Clear City, State or (q)uit]: San Francisco, CA Temperature : 52.5 F / 11.4 C Forecast : Clear City, State or (q)uit]: barrow, alaska Temperature : -6 F / -21 C Forecast : Light Snow Showers Blowing Snow Showers City, State or (q)uit]: Mount Doom, Mordor I can't seem to locate that city, please try again City, State or (q)uit]: q Thanks for visiting us. May the sun shine on your shoulders. computer%
Hints
Here are some general tips to get you rolling.
Automating a request for a web resource -- the urllib module
As documented on www.python.org, the urllib module can open arbitrary resources by Uniform Resource Locator, or URL. At this point, you've probably typed in about a thousand or more URLs into the search field of your browser. http://www.python.org/ is one example, but URLs don't have to start with http://! They can also start with https, ftp, mailto, file, and several others. Here's a three-line code snippet to open a URL, read all of the contents into a string (we'll call html) and close it:
url = urllib.urlopen("http://www.python.org/")
html = url.read()
## Do something with html, parse it and format it in a pretty way
url.close()
It's pretty easy. You'll get an ugly error message if the URL doesn't exist -- if you'd like to be able to handle this in a clean way, feel free to read ahead about exceptions. The problem with this example, however, is that it doesn't take any input from the user! Here's how that's done:
How to pass information to URLs
Let's say you would like to be able to search a dictionary for the word python. One common technique is to pass the query at the end of the URL:
http://www.hostname.com/path?query
A more general technique is to pass parameter-value pairs (just like key-values in dictionaries!), as such:
http://www.hostname.com/path?parameter1=value1¶meter1=value2
...but note that no spaces are allowed in either the values or the parameters. If you need to specify one with a space (e.g., "San Francisco"), you need to replace the spaces with three characters that represent a space in hexadecimal ASCII: %20. So, back to our python query. Here's how we pass it to dictionary.com:
http://dictionary.reference.com/search?q=python
which returns the html from the search page on the word python. If the receiver is a browser, it knows how to render this HTML into a pretty page. If it's your python program and the goal is to produce nice, clean, text that is free of HTML tags, you've got to parse it!
How to parse the html string returned by url.read()
String parsing is one of Python's strengths! As you know, you can find out what the string module supports by asking the on-line help:
import string ## This only needs to be done once per program help(string)
Some of the most userful string functions are split, find, replace and slicing.
Last-minute thoughts
- You might want to balance how interesting the data is with how easy it is to parse the resulting HTML
- You should not query the same data service as someone else in the class; please choose this independently.
- "View Source" is your friend when parsing the html -- try to find a pattern that only appears before your data and another that only appears after it. That can help you to cut away most of the html cruft.
Testing
Test each function in your program with enough different inputs to show that it works. When testing a function, print out the input, the expected result, and the actual result. Produce a paper printout of all your tests to submit with your project.
Extra for Experts
There are a couple of cool features you can add to make your program sing:
- Seamlessly combine queries on several dimension from different sources
- If you ask for two cities, compare their altitudes from one web site, their temperatures from another, the driving distance from a third, the cost of a flight between them from a fourth, etc.
- If you ask for a date, find the famous people who were born then, the number of days from then to now, the phase of the moon, the value of the Dow Jones the last time the day occurred, etc.
- Birthday Gift Information: This is the previous idea on steroids. Ask for lots of personal information about a person (date and place of birth, parents names, pet's name, etc). Then, query every possible site to provide things like...
- The meaning of their name
- The number of days / minutes / seconds they've been alive
- The driving distance and directions from their current location to their place of birth
- Their brief genealogy from
genealogy.com - Famous actors who were born on the same day
- The amount of money you've just emptied from their bank accounts thanks to the personal information they provided. :-)
- Etc.
Checklist
These are the requirements to meet for project completion.
- Printed program listing
- Each function has a docstring that summarizes its purpose and provides a description of its inputs and outputs.
- Tests and test output provided in printed form for each function in the program.
- All functions and variables have meaningful names.
- Tests showing the program produces correct queries with only plaintext output (no HTML tags)
- Tests showing the program helpful, corrective error messages if the user enters bad input.