test_whitespace_eater_unicode(): Make this test Python 2.1 compatible.
[python/dscho.git] / Doc / lib / libcgi.tex
blob83e5e77659bae4e5660a34dc31832c23010c422b
1 \section{\module{cgi} ---
2 Common Gateway Interface support.}
3 \declaremodule{standard}{cgi}
5 \modulesynopsis{Common Gateway Interface support, used to interpret
6 forms in server-side scripts.}
8 \indexii{WWW}{server}
9 \indexii{CGI}{protocol}
10 \indexii{HTTP}{protocol}
11 \indexii{MIME}{headers}
12 \index{URL}
15 Support module for Common Gateway Interface (CGI) scripts.%
16 \index{Common Gateway Interface}
18 This module defines a number of utilities for use by CGI scripts
19 written in Python.
21 \subsection{Introduction}
22 \nodename{cgi-intro}
24 A CGI script is invoked by an HTTP server, usually to process user
25 input submitted through an HTML \code{<FORM>} or \code{<ISINDEX>} element.
27 Most often, CGI scripts live in the server's special \file{cgi-bin}
28 directory. The HTTP server places all sorts of information about the
29 request (such as the client's hostname, the requested URL, the query
30 string, and lots of other goodies) in the script's shell environment,
31 executes the script, and sends the script's output back to the client.
33 The script's input is connected to the client too, and sometimes the
34 form data is read this way; at other times the form data is passed via
35 the ``query string'' part of the URL. This module is intended
36 to take care of the different cases and provide a simpler interface to
37 the Python script. It also provides a number of utilities that help
38 in debugging scripts, and the latest addition is support for file
39 uploads from a form (if your browser supports it --- Grail 0.3 and
40 Netscape 2.0 do).
42 The output of a CGI script should consist of two sections, separated
43 by a blank line. The first section contains a number of headers,
44 telling the client what kind of data is following. Python code to
45 generate a minimal header section looks like this:
47 \begin{verbatim}
48 print "Content-Type: text/html" # HTML is following
49 print # blank line, end of headers
50 \end{verbatim}
52 The second section is usually HTML, which allows the client software
53 to display nicely formatted text with header, in-line images, etc.
54 Here's Python code that prints a simple piece of HTML:
56 \begin{verbatim}
57 print "<TITLE>CGI script output</TITLE>"
58 print "<H1>This is my first CGI script</H1>"
59 print "Hello, world!"
60 \end{verbatim}
62 \subsection{Using the cgi module}
63 \nodename{Using the cgi module}
65 Begin by writing \samp{import cgi}. Do not use \samp{from cgi import
66 *} --- the module defines all sorts of names for its own use or for
67 backward compatibility that you don't want in your namespace.
69 When you write a new script, consider adding the line:
71 \begin{verbatim}
72 import cgitb; cgitb.enable()
73 \end{verbatim}
75 This activates a special exception handler that will display detailed
76 reports in the Web browser if any errors occur. If you'd rather not
77 show the guts of your program to users of your script, you can have
78 the reports saved to files instead, with a line like this:
80 \begin{verbatim}
81 import cgitb; cgitb.enable(display=0, logdir="/tmp")
82 \end{verbatim}
84 It's very helpful to use this feature during script development.
85 The reports produced by \refmodule{cgitb} provide information that
86 can save you a lot of time in tracking down bugs. You can always
87 remove the \code{cgitb} line later when you have tested your script
88 and are confident that it works correctly.
90 To get at submitted form data,
91 it's best to use the \class{FieldStorage} class. The other classes
92 defined in this module are provided mostly for backward compatibility.
93 Instantiate it exactly once, without arguments. This reads the form
94 contents from standard input or the environment (depending on the
95 value of various environment variables set according to the CGI
96 standard). Since it may consume standard input, it should be
97 instantiated only once.
99 The \class{FieldStorage} instance can be indexed like a Python
100 dictionary, and also supports the standard dictionary methods
101 \method{has_key()} and \method{keys()}. The built-in \function{len()}
102 is also supported. Form fields containing empty strings are ignored
103 and do not appear in the dictionary; to keep such values, provide
104 a true value for the the optional \var{keep_blank_values} keyword
105 parameter when creating the \class{FieldStorage} instance.
107 For instance, the following code (which assumes that the
108 \mailheader{Content-Type} header and blank line have already been
109 printed) checks that the fields \code{name} and \code{addr} are both
110 set to a non-empty string:
112 \begin{verbatim}
113 form = cgi.FieldStorage()
114 if not (form.has_key("name") and form.has_key("addr")):
115 print "<H1>Error</H1>"
116 print "Please fill in the name and addr fields."
117 return
118 print "<p>name:", form["name"].value
119 print "<p>addr:", form["addr"].value
120 ...further form processing here...
121 \end{verbatim}
123 Here the fields, accessed through \samp{form[\var{key}]}, are
124 themselves instances of \class{FieldStorage} (or
125 \class{MiniFieldStorage}, depending on the form encoding).
126 The \member{value} attribute of the instance yields the string value
127 of the field. The \method{getvalue()} method returns this string value
128 directly; it also accepts an optional second argument as a default to
129 return if the requested key is not present.
131 If the submitted form data contains more than one field with the same
132 name, the object retrieved by \samp{form[\var{key}]} is not a
133 \class{FieldStorage} or \class{MiniFieldStorage}
134 instance but a list of such instances. Similarly, in this situation,
135 \samp{form.getvalue(\var{key})} would return a list of strings.
136 If you expect this possibility
137 (when your HTML form contains multiple fields with the same name), use
138 the \function{isinstance()} built-in function to determine whether you
139 have a single instance or a list of instances. For example, this
140 code concatenates any number of username fields, separated by
141 commas:
143 \begin{verbatim}
144 value = form.getvalue("username", "")
145 if isinstance(value, list):
146 # Multiple username fields specified
147 usernames = ",".join(value)
148 else:
149 # Single or no username field specified
150 usernames = value
151 \end{verbatim}
153 If a field represents an uploaded file, accessing the value via the
154 \member{value} attribute or the \function{getvalue()} method reads the
155 entire file in memory as a string. This may not be what you want.
156 You can test for an uploaded file by testing either the \member{filename}
157 attribute or the \member{file} attribute. You can then read the data at
158 leisure from the \member{file} attribute:
160 \begin{verbatim}
161 fileitem = form["userfile"]
162 if fileitem.file:
163 # It's an uploaded file; count lines
164 linecount = 0
165 while 1:
166 line = fileitem.file.readline()
167 if not line: break
168 linecount = linecount + 1
169 \end{verbatim}
171 The file upload draft standard entertains the possibility of uploading
172 multiple files from one field (using a recursive
173 \mimetype{multipart/*} encoding). When this occurs, the item will be
174 a dictionary-like \class{FieldStorage} item. This can be determined
175 by testing its \member{type} attribute, which should be
176 \mimetype{multipart/form-data} (or perhaps another MIME type matching
177 \mimetype{multipart/*}). In this case, it can be iterated over
178 recursively just like the top-level form object.
180 When a form is submitted in the ``old'' format (as the query string or
181 as a single data part of type
182 \mimetype{application/x-www-form-urlencoded}), the items will actually
183 be instances of the class \class{MiniFieldStorage}. In this case, the
184 \member{list}, \member{file}, and \member{filename} attributes are
185 always \code{None}.
188 \subsection{Higher Level Interface}
190 \versionadded{2.2} % XXX: Is this true ?
192 The previous section explains how to read CGI form data using the
193 \class{FieldStorage} class. This section describes a higher level
194 interface which was added to this class to allow one to do it in a
195 more readable and intuitive way. The interface doesn't make the
196 techniques described in previous sections obsolete --- they are still
197 useful to process file uploads efficiently, for example.
199 The interface consists of two simple methods. Using the methods
200 you can process form data in a generic way, without the need to worry
201 whether only one or more values were posted under one name.
203 In the previous section, you learned to write following code anytime
204 you expected a user to post more than one value under one name:
206 \begin{verbatim}
207 item = form.getvalue("item")
208 if isinstance(item, list):
209 # The user is requesting more than one item.
210 else:
211 # The user is requesting only one item.
212 \end{verbatim}
214 This situation is common for example when a form contains a group of
215 multiple checkboxes with the same name:
217 \begin{verbatim}
218 <input type="checkbox" name="item" value="1" />
219 <input type="checkbox" name="item" value="2" />
220 \end{verbatim}
222 In most situations, however, there's only one form control with a
223 particular name in a form and then you expect and need only one value
224 associated with this name. So you write a script containing for
225 example this code:
227 \begin{verbatim}
228 user = form.getvalue("user").toupper()
229 \end{verbatim}
231 The problem with the code is that you should never expect that a
232 client will provide valid input to your scripts. For example, if a
233 curious user appends another \samp{user=foo} pair to the query string,
234 then the script would crash, because in this situation the
235 \code{getvalue("user")} method call returns a list instead of a
236 string. Calling the \method{toupper()} method on a list is not valid
237 (since lists do not have a method of this name) and results in an
238 \exception{AttributeError} exception.
240 Therefore, the appropriate way to read form data values was to always
241 use the code which checks whether the obtained value is a single value
242 or a list of values. That's annoying and leads to less readable
243 scripts.
245 A more convenient approach is to use the methods \method{getfirst()}
246 and \method{getlist()} provided by this higher level interface.
248 \begin{methoddesc}[FieldStorage]{getfirst}{name\optional{, default}}
249 Thin method always returns only one value associated with form field
250 \var{name}. The method returns only the first value in case that
251 more values were posted under such name. Please note that the order
252 in which the values are received may vary from browser to browser
253 and should not be counted on.\footnote{Note that some recent
254 versions of the HTML specification do state what order the
255 field values should be supplied in, but knowing whether a
256 request was received from a conforming browser, or even from a
257 browser at all, is tedious and error-prone.} If no such form
258 field or value exists then the method returns the value specified by
259 the optional parameter \var{default}. This parameter defaults to
260 \code{None} if not specified.
261 \end{methoddesc}
263 \begin{methoddesc}[FieldStorage]{getlist}{name}
264 This method always returns a list of values associated with form
265 field \var{name}. The method returns an empty list if no such form
266 field or value exists for \var{name}. It returns a list consisting
267 of one item if only one such value exists.
268 \end{methoddesc}
270 Using these methods you can write nice compact code:
272 \begin{verbatim}
273 import cgi
274 form = cgi.FieldStorage()
275 user = form.getfirst("user", "").toupper() # This way it's safe.
276 for item in form.getlist("item"):
277 do_something(item)
278 \end{verbatim}
281 \subsection{Old classes}
283 These classes, present in earlier versions of the \module{cgi} module,
284 are still supported for backward compatibility. New applications
285 should use the \class{FieldStorage} class.
287 \class{SvFormContentDict} stores single value form content as
288 dictionary; it assumes each field name occurs in the form only once.
290 \class{FormContentDict} stores multiple value form content as a
291 dictionary (the form items are lists of values). Useful if your form
292 contains multiple fields with the same name.
294 Other classes (\class{FormContent}, \class{InterpFormContentDict}) are
295 present for backwards compatibility with really old applications only.
296 If you still use these and would be inconvenienced when they
297 disappeared from a next version of this module, drop me a note.
300 \subsection{Functions}
301 \nodename{Functions in cgi module}
303 These are useful if you want more control, or if you want to employ
304 some of the algorithms implemented in this module in other
305 circumstances.
307 \begin{funcdesc}{parse}{fp\optional{, keep_blank_values\optional{,
308 strict_parsing}}}
309 Parse a query in the environment or from a file (the file defaults
310 to \code{sys.stdin}). The \var{keep_blank_values} and
311 \var{strict_parsing} parameters are passed to \function{parse_qs()}
312 unchanged.
313 \end{funcdesc}
315 \begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values\optional{,
316 strict_parsing}}}
317 Parse a query string given as a string argument (data of type
318 \mimetype{application/x-www-form-urlencoded}). Data are
319 returned as a dictionary. The dictionary keys are the unique query
320 variable names and the values are lists of values for each name.
322 The optional argument \var{keep_blank_values} is
323 a flag indicating whether blank values in
324 URL encoded queries should be treated as blank strings.
325 A true value indicates that blanks should be retained as
326 blank strings. The default false value indicates that
327 blank values are to be ignored and treated as if they were
328 not included.
330 The optional argument \var{strict_parsing} is a flag indicating what
331 to do with parsing errors. If false (the default), errors
332 are silently ignored. If true, errors raise a ValueError
333 exception.
334 \end{funcdesc}
336 \begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values\optional{,
337 strict_parsing}}}
338 Parse a query string given as a string argument (data of type
339 \mimetype{application/x-www-form-urlencoded}). Data are
340 returned as a list of name, value pairs.
342 The optional argument \var{keep_blank_values} is
343 a flag indicating whether blank values in
344 URL encoded queries should be treated as blank strings.
345 A true value indicates that blanks should be retained as
346 blank strings. The default false value indicates that
347 blank values are to be ignored and treated as if they were
348 not included.
350 The optional argument \var{strict_parsing} is a flag indicating what
351 to do with parsing errors. If false (the default), errors
352 are silently ignored. If true, errors raise a ValueError
353 exception.
354 \end{funcdesc}
356 \begin{funcdesc}{parse_multipart}{fp, pdict}
357 Parse input of type \mimetype{multipart/form-data} (for
358 file uploads). Arguments are \var{fp} for the input file and
359 \var{pdict} for a dictionary containing other parameters in
360 the \mailheader{Content-Type} header.
362 Returns a dictionary just like \function{parse_qs()} keys are the
363 field names, each value is a list of values for that field. This is
364 easy to use but not much good if you are expecting megabytes to be
365 uploaded --- in that case, use the \class{FieldStorage} class instead
366 which is much more flexible.
368 Note that this does not parse nested multipart parts --- use
369 \class{FieldStorage} for that.
370 \end{funcdesc}
372 \begin{funcdesc}{parse_header}{string}
373 Parse a MIME header (such as \mailheader{Content-Type}) into a main
374 value and a dictionary of parameters.
375 \end{funcdesc}
377 \begin{funcdesc}{test}{}
378 Robust test CGI script, usable as main program.
379 Writes minimal HTTP headers and formats all information provided to
380 the script in HTML form.
381 \end{funcdesc}
383 \begin{funcdesc}{print_environ}{}
384 Format the shell environment in HTML.
385 \end{funcdesc}
387 \begin{funcdesc}{print_form}{form}
388 Format a form in HTML.
389 \end{funcdesc}
391 \begin{funcdesc}{print_directory}{}
392 Format the current directory in HTML.
393 \end{funcdesc}
395 \begin{funcdesc}{print_environ_usage}{}
396 Print a list of useful (used by CGI) environment variables in
397 HTML.
398 \end{funcdesc}
400 \begin{funcdesc}{escape}{s\optional{, quote}}
401 Convert the characters
402 \character{\&}, \character{<} and \character{>} in string \var{s} to
403 HTML-safe sequences. Use this if you need to display text that might
404 contain such characters in HTML. If the optional flag \var{quote} is
405 true, the double-quote character (\character{"}) is also translated;
406 this helps for inclusion in an HTML attribute value, as in \code{<A
407 HREF="...">}. If the value to be quoted might include single- or
408 double-quote characters, or both, consider using the
409 \function{quoteattr()} function in the \refmodule{xml.sax.saxutils}
410 module instead.
411 \end{funcdesc}
414 \subsection{Caring about security \label{cgi-security}}
416 \indexii{CGI}{security}
418 There's one important rule: if you invoke an external program (via the
419 \function{os.system()} or \function{os.popen()} functions. or others
420 with similar functionality), make very sure you don't pass arbitrary
421 strings received from the client to the shell. This is a well-known
422 security hole whereby clever hackers anywhere on the Web can exploit a
423 gullible CGI script to invoke arbitrary shell commands. Even parts of
424 the URL or field names cannot be trusted, since the request doesn't
425 have to come from your form!
427 To be on the safe side, if you must pass a string gotten from a form
428 to a shell command, you should make sure the string contains only
429 alphanumeric characters, dashes, underscores, and periods.
432 \subsection{Installing your CGI script on a \UNIX\ system}
434 Read the documentation for your HTTP server and check with your local
435 system administrator to find the directory where CGI scripts should be
436 installed; usually this is in a directory \file{cgi-bin} in the server tree.
438 Make sure that your script is readable and executable by ``others''; the
439 \UNIX{} file mode should be \code{0755} octal (use \samp{chmod 0755
440 \var{filename}}). Make sure that the first line of the script contains
441 \code{\#!} starting in column 1 followed by the pathname of the Python
442 interpreter, for instance:
444 \begin{verbatim}
445 #!/usr/local/bin/python
446 \end{verbatim}
448 Make sure the Python interpreter exists and is executable by ``others''.
450 Make sure that any files your script needs to read or write are
451 readable or writable, respectively, by ``others'' --- their mode
452 should be \code{0644} for readable and \code{0666} for writable. This
453 is because, for security reasons, the HTTP server executes your script
454 as user ``nobody'', without any special privileges. It can only read
455 (write, execute) files that everybody can read (write, execute). The
456 current directory at execution time is also different (it is usually
457 the server's cgi-bin directory) and the set of environment variables
458 is also different from what you get when you log in. In particular, don't
459 count on the shell's search path for executables (\envvar{PATH}) or
460 the Python module search path (\envvar{PYTHONPATH}) to be set to
461 anything interesting.
463 If you need to load modules from a directory which is not on Python's
464 default module search path, you can change the path in your script,
465 before importing other modules. For example:
467 \begin{verbatim}
468 import sys
469 sys.path.insert(0, "/usr/home/joe/lib/python")
470 sys.path.insert(0, "/usr/local/lib/python")
471 \end{verbatim}
473 (This way, the directory inserted last will be searched first!)
475 Instructions for non-\UNIX{} systems will vary; check your HTTP server's
476 documentation (it will usually have a section on CGI scripts).
479 \subsection{Testing your CGI script}
481 Unfortunately, a CGI script will generally not run when you try it
482 from the command line, and a script that works perfectly from the
483 command line may fail mysteriously when run from the server. There's
484 one reason why you should still test your script from the command
485 line: if it contains a syntax error, the Python interpreter won't
486 execute it at all, and the HTTP server will most likely send a cryptic
487 error to the client.
489 Assuming your script has no syntax errors, yet it does not work, you
490 have no choice but to read the next section.
493 \subsection{Debugging CGI scripts} \indexii{CGI}{debugging}
495 First of all, check for trivial installation errors --- reading the
496 section above on installing your CGI script carefully can save you a
497 lot of time. If you wonder whether you have understood the
498 installation procedure correctly, try installing a copy of this module
499 file (\file{cgi.py}) as a CGI script. When invoked as a script, the file
500 will dump its environment and the contents of the form in HTML form.
501 Give it the right mode etc, and send it a request. If it's installed
502 in the standard \file{cgi-bin} directory, it should be possible to send it a
503 request by entering a URL into your browser of the form:
505 \begin{verbatim}
506 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
507 \end{verbatim}
509 If this gives an error of type 404, the server cannot find the script
510 -- perhaps you need to install it in a different directory. If it
511 gives another error, there's an installation problem that
512 you should fix before trying to go any further. If you get a nicely
513 formatted listing of the environment and form content (in this
514 example, the fields should be listed as ``addr'' with value ``At Home''
515 and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been
516 installed correctly. If you follow the same procedure for your own
517 script, you should now be able to debug it.
519 The next step could be to call the \module{cgi} module's
520 \function{test()} function from your script: replace its main code
521 with the single statement
523 \begin{verbatim}
524 cgi.test()
525 \end{verbatim}
527 This should produce the same results as those gotten from installing
528 the \file{cgi.py} file itself.
530 When an ordinary Python script raises an unhandled exception (for
531 whatever reason: of a typo in a module name, a file that can't be
532 opened, etc.), the Python interpreter prints a nice traceback and
533 exits. While the Python interpreter will still do this when your CGI
534 script raises an exception, most likely the traceback will end up in
535 one of the HTTP server's log files, or be discarded altogether.
537 Fortunately, once you have managed to get your script to execute
538 \emph{some} code, you can easily send tracebacks to the Web browser
539 using the \refmodule{cgitb} module. If you haven't done so already,
540 just add the line:
542 \begin{verbatim}
543 import cgitb; cgitb.enable()
544 \end{verbatim}
546 to the top of your script. Then try running it again; when a
547 problem occurs, you should see a detailed report that will
548 likely make apparent the cause of the crash.
550 If you suspect that there may be a problem in importing the
551 \refmodule{cgitb} module, you can use an even more robust approach
552 (which only uses built-in modules):
554 \begin{verbatim}
555 import sys
556 sys.stderr = sys.stdout
557 print "Content-Type: text/plain"
558 print
559 ...your code here...
560 \end{verbatim}
562 This relies on the Python interpreter to print the traceback. The
563 content type of the output is set to plain text, which disables all
564 HTML processing. If your script works, the raw HTML will be displayed
565 by your client. If it raises an exception, most likely after the
566 first two lines have been printed, a traceback will be displayed.
567 Because no HTML interpretation is going on, the traceback will be
568 readable.
571 \subsection{Common problems and solutions}
573 \begin{itemize}
574 \item Most HTTP servers buffer the output from CGI scripts until the
575 script is completed. This means that it is not possible to display a
576 progress report on the client's display while the script is running.
578 \item Check the installation instructions above.
580 \item Check the HTTP server's log files. (\samp{tail -f logfile} in a
581 separate window may be useful!)
583 \item Always check a script for syntax errors first, by doing something
584 like \samp{python script.py}.
586 \item If your script does not have any syntax errors, try adding
587 \samp{import cgitb; cgitb.enable()} to the top of the script.
589 \item When invoking external programs, make sure they can be found.
590 Usually, this means using absolute path names --- \envvar{PATH} is
591 usually not set to a very useful value in a CGI script.
593 \item When reading or writing external files, make sure they can be read
594 or written by every user on the system.
596 \item Don't try to give a CGI script a set-uid mode. This doesn't work on
597 most systems, and is a security liability as well.
598 \end{itemize}