CGI Programming

Ka-Ping Yee, 2003-03-05

Now, we shift from learning the language into a particular application: CGI scripting.

1. Putting Pages on the Web

Most everyone who takes CS9H has had some experience with HTML. That will be putting that to good use here.

On the EECS machines, you put web pages in the public_html directory. In order for the visitors to see your pages, you have to make sure this directory is world-readable and world-executable. Doing chmod 755 public_html will take care of this.

cory% mkdir public_html
cory% chmod 755 public_html
cory% ls -ald public_html
drwxr-xr-x   2 cs9h-aa cs198        512 Feb 18 19:32 public_html/
cory%

If you are still receiving an error accessing your page from a browser, you may need to make your home directory (represented by a '~' below) world executable (so the server can drill down into your public_html folder):

cory% ls -ald ~
drwx------ 4 cs9h-aa cs198 4096 2013-09-05 14:34 /home/cc/cs198/fa13/staff/cs198-aa
cory% chmod 711 ~
cory% ls -ald ~
drwx--x--x 4 cs9h-aa cs198 4096 2013-09-05 14:34 /home/cc/cs198/fa13/staff/cs198-aa

If your account name is cs9h-xx and you put a page called foo.html in this directory, then you will be able to view it using the URL:

http://inst.eecs.berkeley.edu/~cs9h-xx/foo.html

2. The Common Gateway Interface

CGI is simply a standard for allowing web servers to call programs (instead of only serving static files on the disk). The basic idea is that when you request a certain kind of file (a CGI script), the web server executes it instead of sending the file over to the client. If the output from the CGI script is correctly formatted, it appears in the user's web browser.

When we say "the output", we mean any text that the script prints on standard output (the Python print command does this). In order to be correctly formatted, the output must begin with some headers, followed by a blank line, and then the content of the file to deliver. Each header is a line starting with a header name (which may not contain spaces, but may contain hyphens), a colon, and a space. Most of the headers are optional, but if any content is sent, the Content-Type header is mandatory. This header specifies the MIME media type of the content. For HTML this is text/html.

Putting this all together gives us a simple CGI script:

#!/usr/local/bin/python

print 'Content-Type: text/html'
print
print '<h1>Hello!</h1>'

For a file to work as a CGI script on the EECS web server, the filename must end in .cgi and the file must be world-executable. The first line causes the script to be run with Python.

Try putting the above text into a file called hello.cgi in your public_html directory. Set the permissions with chmod 755 hello.cgi. Now you should be able to go to the following URL in your web browser:

http://inst.eecs.berkeley.edu/~cs9h-xx/hello.cgi

Congratulations. You've written your first Python CGI script.

3. Error Handling

There are a few different ways that a CGI script can fail:

  1. The httpd server isn't configured to run CGI scripts. (This isn't the case for the instructional machines, but it might be if you're working at home. If so, read the documentation provided with Apache.)
  2. The script file doesn't have an acceptable name.
  3. The script file is not executable.
  4. The directory containing the script file is not executable.
  5. The first line doesn't start with #! and the name of a program.
  6. The program named in the first line doesn't exist or won't run.
  7. The script has incorrect syntax.
  8. The script fails to produce a Content-Type line.
  9. The script fails to produce a blank line after the headers.
  10. The script encounters an error while running.

You may want to keep this checklist handy so you can go through it when you have a problem.

In cases 1 through 7, the problem prevents the script from even starting. You will get an error message from the server like "Permission Denied" or "Forbidden" or "Internal Server Error". Or you might just get the source code of the script dumped on you.

To avoid problem 7, it's a good idea to check your script from the command line. Just run it by typing python hello.cgi. Python will tell you if there's a syntax error.

One problem (#5) we've also been having is with the Python interpreter being moved around by the server admins. Try the command:

    which python
...and change the path in the #! line to match (e.g. if "which python" results in "/usr/bin/env/python" would make line 1: "#!/usr/bin/env/python")

In case 8 you might get an "Internal Server Error" or you might see the output from the script as text instead of HTML.

In case 9 you'll definitely get a server error.

Case 10 has to do with the logic of your program, and is by far the most complex and challenging type to solve.

Usually, when you run a Python program and it encounters an error, you'll see a traceback displayed on your terminal with information about what kind of error occurred, and where it occurred in the program. However, if you have an error in a CGI script, the error message has nowhere to go.

Try inserting the line print x before the last print in your script. If you now visit your script in a Web browser, you'll see that the output disappears, but there's no indication of an error. The program just stops when it hits the error and you don't see anything after that.

To help you diagnose these problems, there is a module called cgitb that will display these tracebacks more nicely. Take your altered script and add these two lines at the top, after the #! line:

import cgitb
cgitb.enable()

Now if you try visiting the page again, you'll see a detailed explanation of the error in your Web browser.

It's generally a good idea to use this module whenever you're developing CGI scripts. The information it provides about errors can save you a lot of time.

When you are doing a real production and don't want users to see dumps of your source code, you can turn off cgitb, or you can have it save the error reports in files instead. For example, the command

cgitb.enable(display=0, logdir="/tmp")

tells cgitb not to display error reports in the Web browser, and to store them in files in /tmp instead.

4. Forms

The main way that you make CGI scripts interactive is to accept input using HTML forms. A form in HTML is enclosed with <form> and </form> tags. The starting <form> tag should have an attribute named action that gives the URL to which the form input will be sent.

The purpose of the HTML form is to let the user enter the values for some form fields. Each field has a name and a string value. When the user submits the form, your CGI script gets all these field names and values.

The form can contain text and normal HTML tags, as well as various input elements for the form fields, most notably <input>. Every form should have a submit button (made with <input type=submit>) so that the user can submit the form.

I won't go through all of the form elements here. There's a pretty good overview of forms at w3schools.com. All we're going to use here are text fields, but you are welcome to get as fancy as you want.

Let's suppose we wanted to provide a Web page that would calculate the square of any number you entered. Here's a simple example of an HTML form:

<form action="square.cgi">
Please enter a number:
<input type=text name=number>
<input type=submit value="Okay.">
</form>

Now you are probably wondering how a Python script receives the input from a form. The cgi module has some nice utilities to take care of this for you. There are a few ways, but the standard way is through FieldStorage(). This function returns a dictionary-like object mapping the field names to their string values. It's really an object that behaves like a dictionary, is not an actual dictionary, but supports most of the things you would do with one.

If you were to display the above form, and the user entered "3" and pressed the "Okay" button, the CGI script named square.cgi would get executed. Calling FieldStorage() in that script would yield a dictionary-ish object containing {'number': '3'}. The reason it's "ish" is that you have to call .value to actually extract the value after the dictionary-looking query form['number']. What is returned is always a string, so if you wanted an integer (as we do in this case), you then need to convert it before you're done. So, a script for square.cgi might look like this:

#!/usr/local/bin/python

import cgitb                      # Always remember to do this first.
cgitb.enable()

import cgi
form = cgi.FieldStorage()

x = int(form['number'].value)     # Values are strings, so we need to convert.

print 'Content-Type: text/html'
print
print 'The square of', x, 'is', x*x, '.'

Miscommunication between the HTML form sender and the CGI script receiver

In the square.cgi example above, there has to be a prior agreement that there will only be one argument sent and it will be called number. What if the sender upgrades to version 1.1 and decides to call its argument integer_number but forgets to tell the receiver? When square.cgi runs it will try to extract the value of number from the dictionary-ish object by calling x = int(form['number'].value), and there will be no key called number. Normally this would result in the simple Python error:

KeyError: 'number' 
      args = ('number',)				
				

except now the error will be seen by the user surfing the web, and that won't be very professional. This is known in some circles as fragile code. The receiver cannot gracefully handle ANY changes to the interface without prior notice. There are two ways to solve this problem. The first is to check if a key with the name number exists in the keys of form, and if not, do something reasonable:

if 'number' in form.keys():
    x = int(form['number'].value)				
else:
    x = 0 ### For now, set it to something reasonable
          ### email Lucy, author of the HTML form and tell her she has some 'splaining to do!

The second way is to call form.getvalue() instead of reaching right in and assuming number is a key in form's dictionary as we did in form['number']. The function getvalue supports an optional second argument (just as the dictionary's get does) which is the return value if the key is not there. This cleaner solution reduces our four-line solution above to a single, non-fragile, pretty bullet-proof line (it can only fail if the value associated with number cannot be converted to an integer -- in production code, we'd want to check for that case too!):

x = int(form.getvalue('number','0'))   ### If number is not a key in form, return 0

5. Redirection

Sometimes it's useful to be able to redirect the user to another URL. This is easy; just print a Location header with the URL you want the user to follow. In order for the Location line to be understood as a header, your program must not print anything before this line.

print 'Location: http://www.berkeley.edu/'
print

Of course, your program could calculate this URL in any way you want, instead of just printing a static address.

Remember to include the blank line afterwards. Even if there is no content, you must print the blank line.