Oops -- Lib/Test should be Lib/test, of course!
[python/dscho.git] / Doc / lib / libcgi.tex
blob4b01862e5b9a7189a1922fec953e7ab2d54ca31a
1 \section{Standard Module \module{cgi}}
2 \label{module-cgi}
3 \stmodindex{cgi}
4 \indexii{WWW}{server}
5 \indexii{CGI}{protocol}
6 \indexii{HTTP}{protocol}
7 \indexii{MIME}{headers}
8 \index{URL}
11 Support module for CGI (Common Gateway Interface) scripts.%
12 \index{Common Gateway Interface}
14 This module defines a number of utilities for use by CGI scripts
15 written in Python.
17 \subsection{Introduction}
18 \nodename{Introduction to the CGI module}
20 A CGI script is invoked by an HTTP server, usually to process user
21 input submitted through an HTML \code{<FORM>} or \code{<ISINPUT>} element.
23 Most often, CGI scripts live in the server's special \file{cgi-bin}
24 directory. The HTTP server places all sorts of information about the
25 request (such as the client's hostname, the requested URL, the query
26 string, and lots of other goodies) in the script's shell environment,
27 executes the script, and sends the script's output back to the client.
29 The script's input is connected to the client too, and sometimes the
30 form data is read this way; at other times the form data is passed via
31 the ``query string'' part of the URL. This module is intended
32 to take care of the different cases and provide a simpler interface to
33 the Python script. It also provides a number of utilities that help
34 in debugging scripts, and the latest addition is support for file
35 uploads from a form (if your browser supports it --- Grail 0.3 and
36 Netscape 2.0 do).
38 The output of a CGI script should consist of two sections, separated
39 by a blank line. The first section contains a number of headers,
40 telling the client what kind of data is following. Python code to
41 generate a minimal header section looks like this:
43 \begin{verbatim}
44 print "Content-type: text/html" # HTML is following
45 print # blank line, end of headers
46 \end{verbatim}
48 The second section is usually HTML, which allows the client software
49 to display nicely formatted text with header, in-line images, etc.
50 Here's Python code that prints a simple piece of HTML:
52 \begin{verbatim}
53 print "<TITLE>CGI script output</TITLE>"
54 print "<H1>This is my first CGI script</H1>"
55 print "Hello, world!"
56 \end{verbatim}
58 (It may not be fully legal HTML according to the letter of the
59 standard, but any browser will understand it.)
61 \subsection{Using the cgi module}
62 \nodename{Using the cgi module}
64 Begin by writing \samp{import cgi}. Do not use \samp{from cgi import
65 *} --- the module defines all sorts of names for its own use or for
66 backward compatibility that you don't want in your namespace.
68 It's best to use the \class{FieldStorage} class. The other classes
69 defined in this module are provided mostly for backward compatibility.
70 Instantiate it exactly once, without arguments. This reads the form
71 contents from standard input or the environment (depending on the
72 value of various environment variables set according to the CGI
73 standard). Since it may consume standard input, it should be
74 instantiated only once.
76 The \class{FieldStorage} instance can be accessed as if it were a Python
77 dictionary. For instance, the following code (which assumes that the
78 \code{content-type} header and blank line have already been printed)
79 checks that the fields \code{name} and \code{addr} are both set to a
80 non-empty string:
82 \begin{verbatim}
83 form = cgi.FieldStorage()
84 form_ok = 0
85 if form.has_key("name") and form.has_key("addr"):
86 if form["name"].value != "" and form["addr"].value != "":
87 form_ok = 1
88 if not form_ok:
89 print "<H1>Error</H1>"
90 print "Please fill in the name and addr fields."
91 return
92 ...further form processing here...
93 \end{verbatim}
95 Here the fields, accessed through \samp{form[\var{key}]}, are
96 themselves instances of \class{FieldStorage} (or
97 \class{MiniFieldStorage}, depending on the form encoding).
99 If the submitted form data contains more than one field with the same
100 name, the object retrieved by \samp{form[\var{key}]} is not a
101 \class{FieldStorage} or \class{MiniFieldStorage}
102 instance but a list of such instances. If you expect this possibility
103 (i.e., when your HTML form comtains multiple fields with the same
104 name), use the \function{type()} function to determine whether you
105 have a single instance or a list of instances. For example, here's
106 code that concatenates any number of username fields, separated by
107 commas:
109 \begin{verbatim}
110 username = form["username"]
111 if type(username) is type([]):
112 # Multiple username fields specified
113 usernames = ""
114 for item in username:
115 if usernames:
116 # Next item -- insert comma
117 usernames = usernames + "," + item.value
118 else:
119 # First item -- don't insert comma
120 usernames = item.value
121 else:
122 # Single username field specified
123 usernames = username.value
124 \end{verbatim}
126 If a field represents an uploaded file, the value attribute reads the
127 entire file in memory as a string. This may not be what you want.
128 You can test for an uploaded file by testing either the filename
129 attribute or the file attribute. You can then read the data at
130 leasure from the file attribute:
132 \begin{verbatim}
133 fileitem = form["userfile"]
134 if fileitem.file:
135 # It's an uploaded file; count lines
136 linecount = 0
137 while 1:
138 line = fileitem.file.readline()
139 if not line: break
140 linecount = linecount + 1
141 \end{verbatim}
143 The file upload draft standard entertains the possibility of uploading
144 multiple files from one field (using a recursive
145 \mimetype{multipart/*} encoding). When this occurs, the item will be
146 a dictionary-like \class{FieldStorage} item. This can be determined
147 by testing its \member{type} attribute, which should be
148 \mimetype{multipart/form-data} (or perhaps another MIME type matching
149 \mimetype{multipart/*}). It this case, it can be iterated over
150 recursively just like the top-level form object.
152 When a form is submitted in the ``old'' format (as the query string or
153 as a single data part of type
154 \mimetype{application/x-www-form-urlencoded}), the items will actually
155 be instances of the class \class{MiniFieldStorage}. In this case, the
156 list, file and filename attributes are always \code{None}.
159 \subsection{Old classes}
161 These classes, present in earlier versions of the \module{cgi} module,
162 are still supported for backward compatibility. New applications
163 should use the \class{FieldStorage} class.
165 \class{SvFormContentDict} stores single value form content as
166 dictionary; it assumes each field name occurs in the form only once.
168 \class{FormContentDict} stores multiple value form content as a
169 dictionary (the form items are lists of values). Useful if your form
170 contains multiple fields with the same name.
172 Other classes (\class{FormContent}, \class{InterpFormContentDict}) are
173 present for backwards compatibility with really old applications only.
174 If you still use these and would be inconvenienced when they
175 disappeared from a next version of this module, drop me a note.
178 \subsection{Functions}
179 \nodename{Functions in cgi module}
181 These are useful if you want more control, or if you want to employ
182 some of the algorithms implemented in this module in other
183 circumstances.
185 \begin{funcdesc}{parse}{fp}
186 Parse a query in the environment or from a file (default
187 \code{sys.stdin}).
188 \end{funcdesc}
190 \begin{funcdesc}{parse_qs}{qs}
191 Parse a query string given as a string argument (data of type
192 \mimetype{application/x-www-form-urlencoded}).
193 \end{funcdesc}
195 \begin{funcdesc}{parse_multipart}{fp, pdict}
196 Parse input of type \mimetype{multipart/form-data} (for
197 file uploads). Arguments are \var{fp} for the input file and
198 \var{pdict} for the dictionary containing other parameters of
199 \code{content-type} header
201 Returns a dictionary just like \function{parse_qs()} keys are the
202 field names, each value is a list of values for that field. This is
203 easy to use but not much good if you are expecting megabytes to be
204 uploaded --- in that case, use the \class{FieldStorage} class instead
205 which is much more flexible. Note that \code{content-type} is the
206 raw, unparsed contents of the \code{content-type} header.
208 Note that this does not parse nested multipart parts --- use
209 \class{FieldStorage} for that.
210 \end{funcdesc}
212 \begin{funcdesc}{parse_header}{string}
213 Parse a header like \code{content-type} into a main
214 content-type and a dictionary of parameters.
215 \end{funcdesc}
217 \begin{funcdesc}{test}{}
218 Robust test CGI script, usable as main program.
219 Writes minimal HTTP headers and formats all information provided to
220 the script in HTML form.
221 \end{funcdesc}
223 \begin{funcdesc}{print_environ}{}
224 Format the shell environment in HTML.
225 \end{funcdesc}
227 \begin{funcdesc}{print_form}{form}
228 Format a form in HTML.
229 \end{funcdesc}
231 \begin{funcdesc}{print_directory}{}
232 Format the current directory in HTML.
233 \end{funcdesc}
235 \begin{funcdesc}{print_environ_usage}{}
236 Print a list of useful (used by CGI) environment variables in
237 HTML.
238 \end{funcdesc}
240 \begin{funcdesc}{escape}{s\optional{, quote}}
241 Convert the characters
242 \character{\&}, \character{<} and \character{>} in string \var{s} to
243 HTML-safe sequences. Use this if you need to display text that might
244 contain such characters in HTML. If the optional flag \var{quote} is
245 true, the double quote character (\character{"}) is also translated;
246 this helps for inclusion in an HTML attribute value, e.g. in \code{<A
247 HREF="...">}.
248 \end{funcdesc}
251 \subsection{Caring about security}
253 There's one important rule: if you invoke an external program (e.g.
254 via the \function{os.system()} or \function{os.popen()} functions),
255 make very sure you don't pass arbitrary strings received from the
256 client to the shell. This is a well-known security hole whereby
257 clever hackers anywhere on the web can exploit a gullible CGI script
258 to invoke arbitrary shell commands. Even parts of the URL or field
259 names cannot be trusted, since the request doesn't have to come from
260 your form!
262 To be on the safe side, if you must pass a string gotten from a form
263 to a shell command, you should make sure the string contains only
264 alphanumeric characters, dashes, underscores, and periods.
267 \subsection{Installing your CGI script on a Unix system}
269 Read the documentation for your HTTP server and check with your local
270 system administrator to find the directory where CGI scripts should be
271 installed; usually this is in a directory \file{cgi-bin} in the server tree.
273 Make sure that your script is readable and executable by ``others''; the
274 \UNIX{} file mode should be \code{0755} octal (use \samp{chmod 0755
275 filename}). Make sure that the first line of the script contains
276 \code{\#!} starting in column 1 followed by the pathname of the Python
277 interpreter, for instance:
279 \begin{verbatim}
280 #!/usr/local/bin/python
281 \end{verbatim}
283 Make sure the Python interpreter exists and is executable by ``others''.
285 Make sure that any files your script needs to read or write are
286 readable or writable, respectively, by ``others'' --- their mode
287 should be \code{0644} for readable and \code{0666} for writable. This
288 is because, for security reasons, the HTTP server executes your script
289 as user ``nobody'', without any special privileges. It can only read
290 (write, execute) files that everybody can read (write, execute). The
291 current directory at execution time is also different (it is usually
292 the server's cgi-bin directory) and the set of environment variables
293 is also different from what you get at login. In particular, don't
294 count on the shell's search path for executables (\envvar{PATH}) or
295 the Python module search path (\envvar{PYTHONPATH}) to be set to
296 anything interesting.
298 If you need to load modules from a directory which is not on Python's
299 default module search path, you can change the path in your script,
300 before importing other modules, e.g.:
302 \begin{verbatim}
303 import sys
304 sys.path.insert(0, "/usr/home/joe/lib/python")
305 sys.path.insert(0, "/usr/local/lib/python")
306 \end{verbatim}
308 (This way, the directory inserted last will be searched first!)
310 Instructions for non-\UNIX{} systems will vary; check your HTTP server's
311 documentation (it will usually have a section on CGI scripts).
314 \subsection{Testing your CGI script}
316 Unfortunately, a CGI script will generally not run when you try it
317 from the command line, and a script that works perfectly from the
318 command line may fail mysteriously when run from the server. There's
319 one reason why you should still test your script from the command
320 line: if it contains a syntax error, the Python interpreter won't
321 execute it at all, and the HTTP server will most likely send a cryptic
322 error to the client.
324 Assuming your script has no syntax errors, yet it does not work, you
325 have no choice but to read the next section.
328 \subsection{Debugging CGI scripts}
330 First of all, check for trivial installation errors --- reading the
331 section above on installing your CGI script carefully can save you a
332 lot of time. If you wonder whether you have understood the
333 installation procedure correctly, try installing a copy of this module
334 file (\file{cgi.py}) as a CGI script. When invoked as a script, the file
335 will dump its environment and the contents of the form in HTML form.
336 Give it the right mode etc, and send it a request. If it's installed
337 in the standard \file{cgi-bin} directory, it should be possible to send it a
338 request by entering a URL into your browser of the form:
340 \begin{verbatim}
341 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
342 \end{verbatim}
344 If this gives an error of type 404, the server cannot find the script
345 -- perhaps you need to install it in a different directory. If it
346 gives another error (e.g. 500), there's an installation problem that
347 you should fix before trying to go any further. If you get a nicely
348 formatted listing of the environment and form content (in this
349 example, the fields should be listed as ``addr'' with value ``At Home''
350 and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been
351 installed correctly. If you follow the same procedure for your own
352 script, you should now be able to debug it.
354 The next step could be to call the \module{cgi} module's
355 \function{test()} function from your script: replace its main code
356 with the single statement
358 \begin{verbatim}
359 cgi.test()
360 \end{verbatim}
362 This should produce the same results as those gotten from installing
363 the \file{cgi.py} file itself.
365 When an ordinary Python script raises an unhandled exception
366 (e.g. because of a typo in a module name, a file that can't be opened,
367 etc.), the Python interpreter prints a nice traceback and exits.
368 While the Python interpreter will still do this when your CGI script
369 raises an exception, most likely the traceback will end up in one of
370 the HTTP server's log file, or be discarded altogether.
372 Fortunately, once you have managed to get your script to execute
373 \emph{some} code, it is easy to catch exceptions and cause a traceback
374 to be printed. The \function{test()} function below in this module is
375 an example. Here are the rules:
377 \begin{enumerate}
378 \item Import the traceback module before entering the \keyword{try}
379 ... \keyword{except} statement
381 \item Assign \code{sys.stderr} to be \code{sys.stdout}
383 \item Make sure you finish printing the headers and the blank line
384 early
386 \item Wrap all remaining code in a \keyword{try} ... \keyword{except}
387 statement
389 \item In the except clause, call \function{traceback.print_exc()}
390 \end{enumerate}
392 For example:
394 \begin{verbatim}
395 import sys
396 import traceback
397 print "Content-type: text/html"
398 print
399 sys.stderr = sys.stdout
400 try:
401 ...your code here...
402 except:
403 print "\n\n<PRE>"
404 traceback.print_exc()
405 \end{verbatim}
407 Notes: The assignment to \code{sys.stderr} is needed because the
408 traceback prints to \code{sys.stderr}.
409 The \code{print "{\e}n{\e}n<PRE>"} statement is necessary to
410 disable the word wrapping in HTML.
412 If you suspect that there may be a problem in importing the traceback
413 module, you can use an even more robust approach (which only uses
414 built-in modules):
416 \begin{verbatim}
417 import sys
418 sys.stderr = sys.stdout
419 print "Content-type: text/plain"
420 print
421 ...your code here...
422 \end{verbatim}
424 This relies on the Python interpreter to print the traceback. The
425 content type of the output is set to plain text, which disables all
426 HTML processing. If your script works, the raw HTML will be displayed
427 by your client. If it raises an exception, most likely after the
428 first two lines have been printed, a traceback will be displayed.
429 Because no HTML interpretation is going on, the traceback will
430 readable.
433 \subsection{Common problems and solutions}
435 \begin{itemize}
436 \item Most HTTP servers buffer the output from CGI scripts until the
437 script is completed. This means that it is not possible to display a
438 progress report on the client's display while the script is running.
440 \item Check the installation instructions above.
442 \item Check the HTTP server's log files. (\samp{tail -f logfile} in a
443 separate window may be useful!)
445 \item Always check a script for syntax errors first, by doing something
446 like \samp{python script.py}.
448 \item When using any of the debugging techniques, don't forget to add
449 \samp{import sys} to the top of the script.
451 \item When invoking external programs, make sure they can be found.
452 Usually, this means using absolute path names --- \envvar{PATH} is
453 usually not set to a very useful value in a CGI script.
455 \item When reading or writing external files, make sure they can be read
456 or written by every user on the system.
458 \item Don't try to give a CGI script a set-uid mode. This doesn't work on
459 most systems, and is a security liability as well.
460 \end{itemize}