Doc/lib/libcgi.tex

   1 \section{\module{cgi} ---
   2          Common Gateway Interface support.}
   3 \declaremodule{standard}{cgi}
   4
   5 \modulesynopsis{Common Gateway Interface support, used to interpret
   6 forms in server-side scripts.}
   7
   8 \indexii{WWW}{server}
   9 \indexii{CGI}{protocol}
  10 \indexii{HTTP}{protocol}
  11 \indexii{MIME}{headers}
  12 \index{URL}
  13
  14
  15 Support module for CGI (Common Gateway Interface) scripts.%
  16 \index{Common Gateway Interface}
  17
  18 This module defines a number of utilities for use by CGI scripts
  19 written in Python.
  20
  21 \subsection{Introduction}
  22 \nodename{cgi-intro}
  23
  24 A CGI script is invoked by an HTTP server, usually to process user
  25 input submitted through an HTML \code{<FORM>} or \code{<ISINDEX>} element.
  26
  27 Most often, CGI scripts live in the server's special \file{cgi-bin}
  28 directory.  The HTTP server places all sorts of information about the
  29 request (such as the client's hostname, the requested URL, the query
  30 string, and lots of other goodies) in the script's shell environment,
  31 executes the script, and sends the script's output back to the client.
  32
  33 The script's input is connected to the client too, and sometimes the
  34 form data is read this way; at other times the form data is passed via
  35 the ``query string'' part of the URL.  This module is intended
  36 to take care of the different cases and provide a simpler interface to
  37 the Python script.  It also provides a number of utilities that help
  38 in debugging scripts, and the latest addition is support for file
  39 uploads from a form (if your browser supports it --- Grail 0.3 and
  40 Netscape 2.0 do).
  41
  42 The output of a CGI script should consist of two sections, separated
  43 by a blank line.  The first section contains a number of headers,
  44 telling the client what kind of data is following.  Python code to
  45 generate a minimal header section looks like this:
  46
  47 \begin{verbatim}
  48 print "Content-type: text/html"     # HTML is following
  49 print                               # blank line, end of headers
  50 \end{verbatim}
  51
  52 The second section is usually HTML, which allows the client software
  53 to display nicely formatted text with header, in-line images, etc.
  54 Here's Python code that prints a simple piece of HTML:
  55
  56 \begin{verbatim}
  57 print "<TITLE>CGI script output</TITLE>"
  58 print "<H1>This is my first CGI script</H1>"
  59 print "Hello, world!"
  60 \end{verbatim}
  61
  62 (It may not be fully legal HTML according to the letter of the
  63 standard, but any browser will understand it.)
  64
  65 \subsection{Using the cgi module}
  66 \nodename{Using the cgi module}
  67
  68 Begin by writing \samp{import cgi}.  Do not use \samp{from cgi import
  69 *} --- the module defines all sorts of names for its own use or for
  70 backward compatibility that you don't want in your namespace.
  71
  72 It's best to use the \class{FieldStorage} class.  The other classes
  73 defined in this module are provided mostly for backward compatibility.
  74 Instantiate it exactly once, without arguments.  This reads the form
  75 contents from standard input or the environment (depending on the
  76 value of various environment variables set according to the CGI
  77 standard).  Since it may consume standard input, it should be
  78 instantiated only once.
  79
  80 The \class{FieldStorage} instance can be accessed as if it were a Python
  81 dictionary.  For instance, the following code (which assumes that the
  82 \code{content-type} header and blank line have already been printed)
  83 checks that the fields \code{name} and \code{addr} are both set to a
  84 non-empty string:
  85
  86 \begin{verbatim}
  87 form = cgi.FieldStorage()
  88 form_ok = 0
  89 if form.has_key("name") and form.has_key("addr"):
  90     if form["name"].value != "" and form["addr"].value != "":
  91         form_ok = 1
  92 if not form_ok:
  93     print "<H1>Error</H1>"
  94     print "Please fill in the name and addr fields."
  95     return
  96 ...further form processing here...
  97 \end{verbatim}
  98
  99 Here the fields, accessed through \samp{form[\var{key}]}, are
 100 themselves instances of \class{FieldStorage} (or
 101 \class{MiniFieldStorage}, depending on the form encoding).
 102
 103 If the submitted form data contains more than one field with the same
 104 name, the object retrieved by \samp{form[\var{key}]} is not a
 105 \class{FieldStorage} or \class{MiniFieldStorage}
 106 instance but a list of such instances.  If you expect this possibility
 107 (i.e., when your HTML form comtains multiple fields with the same
 108 name), use the \function{type()} function to determine whether you
 109 have a single instance or a list of instances.  For example, here's
 110 code that concatenates any number of username fields, separated by
 111 commas:
 112
 113 \begin{verbatim}
 114 username = form["username"]
 115 if type(username) is type([]):
 116     # Multiple username fields specified
 117     usernames = ""
 118     for item in username:
 119         if usernames:
 120             # Next item -- insert comma
 121             usernames = usernames + "," + item.value
 122         else:
 123             # First item -- don't insert comma
 124             usernames = item.value
 125 else:
 126     # Single username field specified
 127     usernames = username.value
 128 \end{verbatim}
 129
 130 If a field represents an uploaded file, the value attribute reads the
 131 entire file in memory as a string.  This may not be what you want.
 132 You can test for an uploaded file by testing either the filename
 133 attribute or the file attribute.  You can then read the data at
 134 leasure from the file attribute:
 135
 136 \begin{verbatim}
 137 fileitem = form["userfile"]
 138 if fileitem.file:
 139     # It's an uploaded file; count lines
 140     linecount = 0
 141     while 1:
 142         line = fileitem.file.readline()
 143         if not line: break
 144         linecount = linecount + 1
 145 \end{verbatim}
 146
 147 The file upload draft standard entertains the possibility of uploading
 148 multiple files from one field (using a recursive
 149 \mimetype{multipart/*} encoding).  When this occurs, the item will be
 150 a dictionary-like \class{FieldStorage} item.  This can be determined
 151 by testing its \member{type} attribute, which should be
 152 \mimetype{multipart/form-data} (or perhaps another MIME type matching
 153 \mimetype{multipart/*}).  In this case, it can be iterated over
 154 recursively just like the top-level form object.
 155
 156 When a form is submitted in the ``old'' format (as the query string or
 157 as a single data part of type
 158 \mimetype{application/x-www-form-urlencoded}), the items will actually
 159 be instances of the class \class{MiniFieldStorage}.  In this case, the
 160 list, file and filename attributes are always \code{None}.
 161
 162
 163 \subsection{Old classes}
 164
 165 These classes, present in earlier versions of the \module{cgi} module,
 166 are still supported for backward compatibility.  New applications
 167 should use the \class{FieldStorage} class.
 168
 169 \class{SvFormContentDict} stores single value form content as
 170 dictionary; it assumes each field name occurs in the form only once.
 171
 172 \class{FormContentDict} stores multiple value form content as a
 173 dictionary (the form items are lists of values).  Useful if your form
 174 contains multiple fields with the same name.
 175
 176 Other classes (\class{FormContent}, \class{InterpFormContentDict}) are
 177 present for backwards compatibility with really old applications only.
 178 If you still use these and would be inconvenienced when they
 179 disappeared from a next version of this module, drop me a note.
 180
 181
 182 \subsection{Functions}
 183 \nodename{Functions in cgi module}
 184
 185 These are useful if you want more control, or if you want to employ
 186 some of the algorithms implemented in this module in other
 187 circumstances.
 188
 189 \begin{funcdesc}{parse}{fp}
 190 Parse a query in the environment or from a file (default
 191 \code{sys.stdin}).
 192 \end{funcdesc}
 193
 194 \begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values, strict_parsing}}
 195 Parse a query string given as a string argument (data of type
 196 \mimetype{application/x-www-form-urlencoded}).  Data are
 197 returned as a dictionary.  The dictionary keys are the unique query
 198 variable names and the values are lists of vales for each name.
 199
 200 The optional argument \var{keep_blank_values} is
 201 a flag indicating whether blank values in
 202 URL encoded queries should be treated as blank strings.
 203 A true value indicates that blanks should be retained as
 204 blank strings.  The default false value indicates that
 205 blank values are to be ignored and treated as if they were
 206 not included.
 207
 208 The optional argument \var{strict_parsing} is a flag indicating what
 209 to do with parsing errors.  If false (the default), errors
 210 are silently ignored.  If true, errors raise a ValueError
 211 exception.
 212 \end{funcdesc}
 213
 214 \begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values, strict_parsing}}
 215 Parse a query string given as a string argument (data of type
 216 \mimetype{application/x-www-form-urlencoded}).  Data are
 217 returned as a list of name, value pairs.
 218
 219 The optional argument \var{keep_blank_values} is
 220 a flag indicating whether blank values in
 221 URL encoded queries should be treated as blank strings.
 222 A true value indicates that blanks should be retained as
 223 blank strings.  The default false value indicates that
 224 blank values are to be ignored and treated as if they were
 225 not included.
 226
 227 The optional argument \var{strict_parsing} is a flag indicating what
 228 to do with parsing errors.  If false (the default), errors
 229 are silently ignored.  If true, errors raise a ValueError
 230 exception.
 231 \end{funcdesc}
 232
 233 \begin{funcdesc}{parse_multipart}{fp, pdict}
 234 Parse input of type \mimetype{multipart/form-data} (for
 235 file uploads).  Arguments are \var{fp} for the input file and
 236 \var{pdict} for the dictionary containing other parameters of
 237 \code{content-type} header
 238
 239 Returns a dictionary just like \function{parse_qs()} keys are the
 240 field names, each value is a list of values for that field.  This is
 241 easy to use but not much good if you are expecting megabytes to be
 242 uploaded --- in that case, use the \class{FieldStorage} class instead
 243 which is much more flexible.  Note that \code{content-type} is the
 244 raw, unparsed contents of the \code{content-type} header.
 245
 246 Note that this does not parse nested multipart parts --- use
 247 \class{FieldStorage} for that.
 248 \end{funcdesc}
 249
 250 \begin{funcdesc}{parse_header}{string}
 251 Parse a header like \code{content-type} into a main
 252 content-type and a dictionary of parameters.
 253 \end{funcdesc}
 254
 255 \begin{funcdesc}{test}{}
 256 Robust test CGI script, usable as main program.
 257 Writes minimal HTTP headers and formats all information provided to
 258 the script in HTML form.
 259 \end{funcdesc}
 260
 261 \begin{funcdesc}{print_environ}{}
 262 Format the shell environment in HTML.
 263 \end{funcdesc}
 264
 265 \begin{funcdesc}{print_form}{form}
 266 Format a form in HTML.
 267 \end{funcdesc}
 268
 269 \begin{funcdesc}{print_directory}{}
 270 Format the current directory in HTML.
 271 \end{funcdesc}
 272
 273 \begin{funcdesc}{print_environ_usage}{}
 274 Print a list of useful (used by CGI) environment variables in
 275 HTML.
 276 \end{funcdesc}
 277
 278 \begin{funcdesc}{escape}{s\optional{, quote}}
 279 Convert the characters
 280 \character{\&}, \character{<} and \character{>} in string \var{s} to
 281 HTML-safe sequences.  Use this if you need to display text that might
 282 contain such characters in HTML.  If the optional flag \var{quote} is
 283 true, the double quote character (\character{"}) is also translated;
 284 this helps for inclusion in an HTML attribute value, e.g. in \code{<A
 285 HREF="...">}.
 286 \end{funcdesc}
 287
 288
 289 \subsection{Caring about security}
 290
 291 There's one important rule: if you invoke an external program (e.g.
 292 via the \function{os.system()} or \function{os.popen()} functions),
 293 make very sure you don't pass arbitrary strings received from the
 294 client to the shell.  This is a well-known security hole whereby
 295 clever hackers anywhere on the web can exploit a gullible CGI script
 296 to invoke arbitrary shell commands.  Even parts of the URL or field
 297 names cannot be trusted, since the request doesn't have to come from
 298 your form!
 299
 300 To be on the safe side, if you must pass a string gotten from a form
 301 to a shell command, you should make sure the string contains only
 302 alphanumeric characters, dashes, underscores, and periods.
 303
 304
 305 \subsection{Installing your CGI script on a Unix system}
 306
 307 Read the documentation for your HTTP server and check with your local
 308 system administrator to find the directory where CGI scripts should be
 309 installed; usually this is in a directory \file{cgi-bin} in the server tree.
 310
 311 Make sure that your script is readable and executable by ``others''; the
 312 \UNIX{} file mode should be \code{0755} octal (use \samp{chmod 0755
 313 \var{filename}}).  Make sure that the first line of the script contains
 314 \code{\#!} starting in column 1 followed by the pathname of the Python
 315 interpreter, for instance:
 316
 317 \begin{verbatim}
 318 #!/usr/local/bin/python
 319 \end{verbatim}
 320
 321 Make sure the Python interpreter exists and is executable by ``others''.
 322
 323 Make sure that any files your script needs to read or write are
 324 readable or writable, respectively, by ``others'' --- their mode
 325 should be \code{0644} for readable and \code{0666} for writable.  This
 326 is because, for security reasons, the HTTP server executes your script
 327 as user ``nobody'', without any special privileges.  It can only read
 328 (write, execute) files that everybody can read (write, execute).  The
 329 current directory at execution time is also different (it is usually
 330 the server's cgi-bin directory) and the set of environment variables
 331 is also different from what you get at login.  In particular, don't
 332 count on the shell's search path for executables (\envvar{PATH}) or
 333 the Python module search path (\envvar{PYTHONPATH}) to be set to
 334 anything interesting.
 335
 336 If you need to load modules from a directory which is not on Python's
 337 default module search path, you can change the path in your script,
 338 before importing other modules, e.g.:
 339
 340 \begin{verbatim}
 341 import sys
 342 sys.path.insert(0, "/usr/home/joe/lib/python")
 343 sys.path.insert(0, "/usr/local/lib/python")
 344 \end{verbatim}
 345
 346 (This way, the directory inserted last will be searched first!)
 347
 348 Instructions for non-\UNIX{} systems will vary; check your HTTP server's
 349 documentation (it will usually have a section on CGI scripts).
 350
 351
 352 \subsection{Testing your CGI script}
 353
 354 Unfortunately, a CGI script will generally not run when you try it
 355 from the command line, and a script that works perfectly from the
 356 command line may fail mysteriously when run from the server.  There's
 357 one reason why you should still test your script from the command
 358 line: if it contains a syntax error, the Python interpreter won't
 359 execute it at all, and the HTTP server will most likely send a cryptic
 360 error to the client.
 361
 362 Assuming your script has no syntax errors, yet it does not work, you
 363 have no choice but to read the next section.
 364
 365
 366 \subsection{Debugging CGI scripts}
 367
 368 First of all, check for trivial installation errors --- reading the
 369 section above on installing your CGI script carefully can save you a
 370 lot of time.  If you wonder whether you have understood the
 371 installation procedure correctly, try installing a copy of this module
 372 file (\file{cgi.py}) as a CGI script.  When invoked as a script, the file
 373 will dump its environment and the contents of the form in HTML form.
 374 Give it the right mode etc, and send it a request.  If it's installed
 375 in the standard \file{cgi-bin} directory, it should be possible to send it a
 376 request by entering a URL into your browser of the form:
 377
 378 \begin{verbatim}
 379 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
 380 \end{verbatim}
 381
 382 If this gives an error of type 404, the server cannot find the script
 383 -- perhaps you need to install it in a different directory.  If it
 384 gives another error (e.g.  500), there's an installation problem that
 385 you should fix before trying to go any further.  If you get a nicely
 386 formatted listing of the environment and form content (in this
 387 example, the fields should be listed as ``addr'' with value ``At Home''
 388 and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been
 389 installed correctly.  If you follow the same procedure for your own
 390 script, you should now be able to debug it.
 391
 392 The next step could be to call the \module{cgi} module's
 393 \function{test()} function from your script: replace its main code
 394 with the single statement
 395
 396 \begin{verbatim}
 397 cgi.test()
 398 \end{verbatim}
 399
 400 This should produce the same results as those gotten from installing
 401 the \file{cgi.py} file itself.
 402
 403 When an ordinary Python script raises an unhandled exception
 404 (e.g. because of a typo in a module name, a file that can't be opened,
 405 etc.), the Python interpreter prints a nice traceback and exits.
 406 While the Python interpreter will still do this when your CGI script
 407 raises an exception, most likely the traceback will end up in one of
 408 the HTTP server's log file, or be discarded altogether.
 409
 410 Fortunately, once you have managed to get your script to execute
 411 \emph{some} code, it is easy to catch exceptions and cause a traceback
 412 to be printed.  The \function{test()} function below in this module is
 413 an example.  Here are the rules:
 414
 415 \begin{enumerate}
 416 \item Import the traceback module before entering the \keyword{try}
 417    ... \keyword{except} statement
 418
 419 \item Assign \code{sys.stderr} to be \code{sys.stdout}
 420
 421 \item Make sure you finish printing the headers and the blank line
 422    early
 423
 424 \item Wrap all remaining code in a \keyword{try} ... \keyword{except}
 425    statement
 426
 427 \item In the except clause, call \function{traceback.print_exc()}
 428 \end{enumerate}
 429
 430 For example:
 431
 432 \begin{verbatim}
 433 import sys
 434 import traceback
 435 print "Content-type: text/html"
 436 print
 437 sys.stderr = sys.stdout
 438 try:
 439     ...your code here...
 440 except:
 441     print "\n\n<PRE>"
 442     traceback.print_exc()
 443 \end{verbatim}
 444
 445 Notes: The assignment to \code{sys.stderr} is needed because the
 446 traceback prints to \code{sys.stderr}.
 447 The \code{print "{\e}n{\e}n<PRE>"} statement is necessary to
 448 disable the word wrapping in HTML.
 449
 450 If you suspect that there may be a problem in importing the traceback
 451 module, you can use an even more robust approach (which only uses
 452 built-in modules):
 453
 454 \begin{verbatim}
 455 import sys
 456 sys.stderr = sys.stdout
 457 print "Content-type: text/plain"
 458 print
 459 ...your code here...
 460 \end{verbatim}
 461
 462 This relies on the Python interpreter to print the traceback.  The
 463 content type of the output is set to plain text, which disables all
 464 HTML processing.  If your script works, the raw HTML will be displayed
 465 by your client.  If it raises an exception, most likely after the
 466 first two lines have been printed, a traceback will be displayed.
 467 Because no HTML interpretation is going on, the traceback will
 468 readable.
 469
 470
 471 \subsection{Common problems and solutions}
 472
 473 \begin{itemize}
 474 \item Most HTTP servers buffer the output from CGI scripts until the
 475 script is completed.  This means that it is not possible to display a
 476 progress report on the client's display while the script is running.
 477
 478 \item Check the installation instructions above.
 479
 480 \item Check the HTTP server's log files.  (\samp{tail -f logfile} in a
 481 separate window may be useful!)
 482
 483 \item Always check a script for syntax errors first, by doing something
 484 like \samp{python script.py}.
 485
 486 \item When using any of the debugging techniques, don't forget to add
 487 \samp{import sys} to the top of the script.
 488
 489 \item When invoking external programs, make sure they can be found.
 490 Usually, this means using absolute path names --- \envvar{PATH} is
 491 usually not set to a very useful value in a CGI script.
 492
 493 \item When reading or writing external files, make sure they can be read
 494 or written by every user on the system.
 495
 496 \item Don't try to give a CGI script a set-uid mode.  This doesn't work on
 497 most systems, and is a security liability as well.
 498 \end{itemize}
 499