Efficiency is intelligent laziness.
Pay no attention to that man behind the curtain.
-- Wizard of Oz
In 1769 a wonderful hoax was concocted -- a mechanical robot called The Turk that could play chess at the level of a master toured Europe and fooled almost everyone...it even beat Napoleon in a game! What made it so wonderful was that it was constructed like a magician's box to look like it was an actual nuts-and-bolts machine, but in fact hid a human master beneath the table. The deception and "fake engineering" was the work of brilliance. The hoax exploited one of the central tenets of Computer Science and Engineering: that the implementation details are hidden behind the user interface by a curtain of abstraction.
By now, you've probably implemented the Python-powered unit converter from Project 2a and wondered why we couldn't just ask Google to do the conversion for us. Well, that's a good question. They did allow it in the fall of 2005. In the spring of 2006 they updated their Terms of Service to forbid offline searches. That's just as well, we've got something ever better set up.
In this project, you'll write a Python program to do web fetching -- take input from a user and use it to query a web site, then take the resulting web page that is "returned", and format it nicely back to the user. The act of calling the web will be hidden behind the abstraction layer, so your program will seem as if it has the resources of the entire world (wide web) encoded within it. It's a pretty nifty trick.
All of the interaction with the user should be in plaintext (i.e., the user should see no HTML tags). Remember, the principal idea is to create the illusion, in a friendly text-only way, that your python program is extremely smart. Think of your Python-powered unit converter. Wouldn't it be neat if it were simply a shell to a website that did all the work? That's the point of this project and of the quotes above.
Sample online data collections
Here are some examples of online information you might want to query based on user input. Don't choose unit conversion because we want you to have exposure to other types of data.
- Dictionaries, Thesauruses, Encyclopedias
- Price of a certain stock
- Number of fish visible on an online fishbowl webcam
- Traffic information for a certain road
- Your idea here! (just not weather, see below)
Try to think about information you'd actually care about yourself! What web pages do you regularly visit? Sometimes programs are written for others, sometimes they're written as an exercise to gain language fluency, and sometimes they're written for your own personal use. We're giving you the flexibility to choose the information you wish and whose retrieval you would like to automate.
Here's a sample interaction of a program that tells you the weather for cities in the United States. Note that your program shouldn't look like ours, because you will choose a data service that is of interest to you. We'll say it again -- do not choose weather as your data collection -- find something else that interests you. It only needs to adhere to the final checklist below.
computer% python project2b.py Welcome to WeatherGuru v1.0 by Dan Garcia! Tell me a city and state and I'll tell you its weather! City, State or (q)uit]: berkeley, ca Temperature : 50.5 F / 10.3 C Forecast : Clear City, State or (q)uit]: San Francisco, CA Temperature : 52.5 F / 11.4 C Forecast : Clear City, State or (q)uit]: barrow, alaska Temperature : -6 F / -21 C Forecast : Light Snow Showers Blowing Snow Showers City, State or (q)uit]: Mount Doom, Mordor I can't seem to locate that city, please try again City, State or (q)uit]: q Thanks for visiting us. May the sun shine on your shoulders. computer%
Here are some general tips to get you rolling.
Automating a request for a web resource -- the
As documented on
urllib module can open arbitrary resources by Uniform Resource Locator, or URL. At this point, you've probably typed in about a thousand or more URLs into the search field of your browser.
http://www.python.org/ is one example, but URLs don't have to start with
http://! They can also start with
file, and several others. Here's a three-line code snippet to open a URL, read all of the contents into a string (we'll call
html) and close it:
url = urllib.urlopen("http://www.python.org/") html = url.read() ## Do something with html, parse it and format it in a pretty way url.close()
It's pretty easy. You'll get an ugly error message if the URL doesn't exist -- if you'd like to be able to handle this in a clean way, feel free to read ahead about exceptions. The problem with this example, however, is that it doesn't take any input from the user! Here's how that's done:
How to pass information to URLs
Let's say you would like to be able to search a dictionary for the word python. One common technique is to pass the query at the end of the URL:
A more general technique is to pass parameter-value pairs (just like key-values in dictionaries!), as such:
...but note that no spaces are allowed in either the values or the parameters. If you need to specify one with a space (e.g., "San Francisco"), you need to replace the spaces with three characters that represent a space in hexadecimal ASCII: %20. So, back to our python query. Here's how we pass it to
which returns the
html from the search page on the word python. If the receiver is a browser, it knows how to render this HTML into a pretty page. If it's your python program and the goal is to produce nice, clean, text that is free of HTML tags, you've got to parse it!
How to parse the html string returned by url.read()
String parsing is one of Python's strengths! As you know, you can find out what the string module supports by asking the on-line help:
import string ## This only needs to be done once per program help(string)
Some of the most userful string functions are
- You might want to balance how interesting the data is with how easy it is to parse the resulting HTML
- You should not query the same data service as someone else in the class; please choose this independently.
- "View Source" is your friend when parsing the html -- try to find a pattern that only appears before your data and another that only appears after it. That can help you to cut away most of the html cruft.
Test each function in your program with enough different inputs to show that it works. When testing a function, print out the input, the expected result, and the actual result.
There are a couple of cool features you can add to make your program sing:
- Seamlessly combine queries on several dimension from different sources
- If you ask for two cities, compare their altitudes from one web site, their temperatures from another, the driving distance from a third, the cost of a flight between them from a fourth, etc.
- If you ask for a date, find the famous people who were born then, the number of days from then to now, the phase of the moon, the value of the Dow Jones the last time the day occurred, etc.
- Birthday Gift Information: This is the previous idea on steroids. Ask for lots of personal information about a person (date and place of birth, parents names, pet's name, etc). Then, query every possible site to provide things like...
- The meaning of their name
- The number of days / minutes / seconds they've been alive
- The driving distance and directions from their current location to their place of birth
- Their brief genealogy from
- Famous actors who were born on the same day
- The amount of money you've just emptied from their bank accounts thanks to the personal information they provided. :-)
These are the requirements to meet for project completion.
- Use of functions — your program must be broken down into one or more functions — it cannot be one big long script.
- Each function has a docstring that summarizes its purpose and provides a description of its inputs and outputs.
- Tests and test output for each function in the program.
- All functions and variables have meaningful names.
- Tests showing the program produces correct queries with only plaintext output (no HTML tags)
- Tests showing the program helpful, corrective error messages if the user enters bad input.