Lib/cgi.py

   1 #! /usr/local/bin/python
   2
   3 """Support module for CGI (Common Gateway Interface) scripts.
   4
   5 This module defines a number of utilities for use by CGI scripts
   6 written in Python.
   7
   8
   9 Introduction
  10 ------------
  11
  12 A CGI script is invoked by an HTTP server, usually to process user
  13 input submitted through an HTML <FORM> or <ISINPUT> element.
  14
  15 Most often, CGI scripts live in the server's special cgi-bin
  16 directory.  The HTTP server places all sorts of information about the
  17 request (such as the client's hostname, the requested URL, the query
  18 string, and lots of other goodies) in the script's shell environment,
  19 executes the script, and sends the script's output back to the client.
  20
  21 The script's input is connected to the client too, and sometimes the
  22 form data is read this way; at other times the form data is passed via
  23 the "query string" part of the URL.  This module (cgi.py) is intended
  24 to take care of the different cases and provide a simpler interface to
  25 the Python script.  It also provides a number of utilities that help
  26 in debugging scripts, and the latest addition is support for file
  27 uploads from a form (if your browser supports it -- Grail 0.3 and
  28 Netscape 2.0 do).
  29
  30 The output of a CGI script should consist of two sections, separated
  31 by a blank line.  The first section contains a number of headers,
  32 telling the client what kind of data is following.  Python code to
  33 generate a minimal header section looks like this:
  34
  35         print "Content-type: text/html" # HTML is following
  36         print                           # blank line, end of headers
  37
  38 The second section is usually HTML, which allows the client software
  39 to display nicely formatted text with header, in-line images, etc.
  40 Here's Python code that prints a simple piece of HTML:
  41
  42         print "<TITLE>CGI script output</TITLE>"
  43         print "<H1>This is my first CGI script</H1>"
  44         print "Hello, world!"
  45
  46 It may not be fully legal HTML according to the letter of the
  47 standard, but any browser will understand it.
  48
  49
  50 Using the cgi module
  51 --------------------
  52
  53 Begin by writing "import cgi".  Don't use "from cgi import *" -- the
  54 module defines all sorts of names for its own use or for backward
  55 compatibility that you don't want in your namespace.
  56
  57 It's best to use the FieldStorage class.  The other classes define in this
  58 module are provided mostly for backward compatibility.  Instantiate it
  59 exactly once, without arguments.  This reads the form contents from
  60 standard input or the environment (depending on the value of various
  61 environment variables set according to the CGI standard).  Since it may
  62 consume standard input, it should be instantiated only once.
  63
  64 The FieldStorage instance can be accessed as if it were a Python
  65 dictionary.  For instance, the following code (which assumes that the
  66 Content-type header and blank line have already been printed) checks that
  67 the fields "name" and "addr" are both set to a non-empty string:
  68
  69         form = cgi.FieldStorage()
  70         form_ok = 0
  71         if form.has_key("name") and form.has_key("addr"):
  72                 if form["name"].value != "" and form["addr"].value != "":
  73                         form_ok = 1
  74         if not form_ok:
  75                 print "<H1>Error</H1>"
  76                 print "Please fill in the name and addr fields."
  77                 return
  78         ...further form processing here...
  79
  80 Here the fields, accessed through form[key], are themselves instances
  81 of FieldStorage (or MiniFieldStorage, depending on the form encoding).
  82
  83 If the submitted form data contains more than one field with the same
  84 name, the object retrieved by form[key] is not a (Mini)FieldStorage
  85 instance but a list of such instances.  If you are expecting this
  86 possibility (i.e., when your HTML form comtains multiple fields with
  87 the same name), use the type() function to determine whether you have
  88 a single instance or a list of instances.  For example, here's code
  89 that concatenates any number of username fields, separated by commas:
  90
  91         username = form["username"]
  92         if type(username) is type([]):
  93                 # Multiple username fields specified
  94                 usernames = ""
  95                 for item in username:
  96                         if usernames:
  97                                 # Next item -- insert comma
  98                                 usernames = usernames + "," + item.value
  99                         else:
 100                                 # First item -- don't insert comma
 101                                 usernames = item.value
 102         else:
 103                 # Single username field specified
 104                 usernames = username.value
 105
 106 If a field represents an uploaded file, the value attribute reads the
 107 entire file in memory as a string.  This may not be what you want.  You can
 108 test for an uploaded file by testing either the filename attribute or the
 109 file attribute.  You can then read the data at leasure from the file
 110 attribute:
 111
 112         fileitem = form["userfile"]
 113         if fileitem.file:
 114                 # It's an uploaded file; count lines
 115                 linecount = 0
 116                 while 1:
 117                         line = fileitem.file.readline()
 118                         if not line: break
 119                         linecount = linecount + 1
 120
 121 The file upload draft standard entertains the possibility of uploading
 122 multiple files from one field (using a recursive multipart/*
 123 encoding).  When this occurs, the item will be a dictionary-like
 124 FieldStorage item.  This can be determined by testing its type
 125 attribute, which should have the value "multipart/form-data" (or
 126 perhaps another string beginning with "multipart/").  It this case, it
 127 can be iterated over recursively just like the top-level form object.
 128
 129 When a form is submitted in the "old" format (as the query string or as a
 130 single data part of type application/x-www-form-urlencoded), the items
 131 will actually be instances of the class MiniFieldStorage.  In this case,
 132 the list, file and filename attributes are always None.
 133
 134
 135 Old classes
 136 -----------
 137
 138 These classes, present in earlier versions of the cgi module, are still
 139 supported for backward compatibility.  New applications should use the
 140 FieldStorage class.
 141
 142 SvFormContentDict: single value form content as dictionary; assumes each
 143 field name occurs in the form only once.
 144
 145 FormContentDict: multiple value form content as dictionary (the form
 146 items are lists of values).  Useful if your form contains multiple
 147 fields with the same name.
 148
 149 Other classes (FormContent, InterpFormContentDict) are present for
 150 backwards compatibility with really old applications only.  If you still
 151 use these and would be inconvenienced when they disappeared from a next
 152 version of this module, drop me a note.
 153
 154
 155 Functions
 156 ---------
 157
 158 These are useful if you want more control, or if you want to employ
 159 some of the algorithms implemented in this module in other
 160 circumstances.
 161
 162 parse(fp, [environ, [keep_blank_values, [strict_parsing]]]): parse a
 163 form into a Python dictionary.
 164
 165 parse_qs(qs, [keep_blank_values, [strict_parsing]]): parse a query
 166 string (data of type application/x-www-form-urlencoded).  Data are
 167 returned as a dictionary.  The dictionary keys are the unique query
 168 variable names and the values are lists of vales for each name.
 169
 170 parse_qsl(qs, [keep_blank_values, [strict_parsing]]): parse a query
 171 string (data of type application/x-www-form-urlencoded).  Data are
 172 returned as a list of (name, value) pairs.
 173
 174 parse_multipart(fp, pdict): parse input of type multipart/form-data (for
 175 file uploads).
 176
 177 parse_header(string): parse a header like Content-type into a main
 178 value and a dictionary of parameters.
 179
 180 test(): complete test program.
 181
 182 print_environ(): format the shell environment in HTML.
 183
 184 print_form(form): format a form in HTML.
 185
 186 print_environ_usage(): print a list of useful environment variables in
 187 HTML.
 188
 189 escape(): convert the characters "&", "<" and ">" to HTML-safe
 190 sequences.  Use this if you need to display text that might contain
 191 such characters in HTML.  To translate URLs for inclusion in the HREF
 192 attribute of an <A> tag, use urllib.quote().
 193
 194 log(fmt, ...): write a line to a log file; see docs for initlog().
 195
 196
 197 Caring about security
 198 ---------------------
 199
 200 There's one important rule: if you invoke an external program (e.g.
 201 via the os.system() or os.popen() functions), make very sure you don't
 202 pass arbitrary strings received from the client to the shell.  This is
 203 a well-known security hole whereby clever hackers anywhere on the web
 204 can exploit a gullible CGI script to invoke arbitrary shell commands.
 205 Even parts of the URL or field names cannot be trusted, since the
 206 request doesn't have to come from your form!
 207
 208 To be on the safe side, if you must pass a string gotten from a form
 209 to a shell command, you should make sure the string contains only
 210 alphanumeric characters, dashes, underscores, and periods.
 211
 212
 213 Installing your CGI script on a Unix system
 214 -------------------------------------------
 215
 216 Read the documentation for your HTTP server and check with your local
 217 system administrator to find the directory where CGI scripts should be
 218 installed; usually this is in a directory cgi-bin in the server tree.
 219
 220 Make sure that your script is readable and executable by "others"; the
 221 Unix file mode should be 755 (use "chmod 755 filename").  Make sure
 222 that the first line of the script contains #! starting in column 1
 223 followed by the pathname of the Python interpreter, for instance:
 224
 225         #! /usr/local/bin/python
 226
 227 Make sure the Python interpreter exists and is executable by "others".
 228
 229 Note that it's probably not a good idea to use #! /usr/bin/env python
 230 here, since the Python interpreter may not be on the default path
 231 given to CGI scripts!!!
 232
 233 Make sure that any files your script needs to read or write are
 234 readable or writable, respectively, by "others" -- their mode should
 235 be 644 for readable and 666 for writable.  This is because, for
 236 security reasons, the HTTP server executes your script as user
 237 "nobody", without any special privileges.  It can only read (write,
 238 execute) files that everybody can read (write, execute).  The current
 239 directory at execution time is also different (it is usually the
 240 server's cgi-bin directory) and the set of environment variables is
 241 also different from what you get at login.  in particular, don't count
 242 on the shell's search path for executables ($PATH) or the Python
 243 module search path ($PYTHONPATH) to be set to anything interesting.
 244
 245 If you need to load modules from a directory which is not on Python's
 246 default module search path, you can change the path in your script,
 247 before importing other modules, e.g.:
 248
 249         import sys
 250         sys.path.insert(0, "/usr/home/joe/lib/python")
 251         sys.path.insert(0, "/usr/local/lib/python")
 252
 253 This way, the directory inserted last will be searched first!
 254
 255 Instructions for non-Unix systems will vary; check your HTTP server's
 256 documentation (it will usually have a section on CGI scripts).
 257
 258
 259 Testing your CGI script
 260 -----------------------
 261
 262 Unfortunately, a CGI script will generally not run when you try it
 263 from the command line, and a script that works perfectly from the
 264 command line may fail mysteriously when run from the server.  There's
 265 one reason why you should still test your script from the command
 266 line: if it contains a syntax error, the python interpreter won't
 267 execute it at all, and the HTTP server will most likely send a cryptic
 268 error to the client.
 269
 270 Assuming your script has no syntax errors, yet it does not work, you
 271 have no choice but to read the next section:
 272
 273
 274 Debugging CGI scripts
 275 ---------------------
 276
 277 First of all, check for trivial installation errors -- reading the
 278 section above on installing your CGI script carefully can save you a
 279 lot of time.  If you wonder whether you have understood the
 280 installation procedure correctly, try installing a copy of this module
 281 file (cgi.py) as a CGI script.  When invoked as a script, the file
 282 will dump its environment and the contents of the form in HTML form.
 283 Give it the right mode etc, and send it a request.  If it's installed
 284 in the standard cgi-bin directory, it should be possible to send it a
 285 request by entering a URL into your browser of the form:
 286
 287         http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
 288
 289 If this gives an error of type 404, the server cannot find the script
 290 -- perhaps you need to install it in a different directory.  If it
 291 gives another error (e.g.  500), there's an installation problem that
 292 you should fix before trying to go any further.  If you get a nicely
 293 formatted listing of the environment and form content (in this
 294 example, the fields should be listed as "addr" with value "At Home"
 295 and "name" with value "Joe Blow"), the cgi.py script has been
 296 installed correctly.  If you follow the same procedure for your own
 297 script, you should now be able to debug it.
 298
 299 The next step could be to call the cgi module's test() function from
 300 your script: replace its main code with the single statement
 301
 302         cgi.test()
 303
 304 This should produce the same results as those gotten from installing
 305 the cgi.py file itself.
 306
 307 When an ordinary Python script raises an unhandled exception (e.g.,
 308 because of a typo in a module name, a file that can't be opened,
 309 etc.), the Python interpreter prints a nice traceback and exits.
 310 While the Python interpreter will still do this when your CGI script
 311 raises an exception, most likely the traceback will end up in one of
 312 the HTTP server's log file, or be discarded altogether.
 313
 314 Fortunately, once you have managed to get your script to execute
 315 *some* code, it is easy to catch exceptions and cause a traceback to
 316 be printed.  The test() function below in this module is an example.
 317 Here are the rules:
 318
 319         1. Import the traceback module (before entering the
 320            try-except!)
 321
 322         2. Make sure you finish printing the headers and the blank
 323            line early
 324
 325         3. Assign sys.stderr to sys.stdout
 326
 327         3. Wrap all remaining code in a try-except statement
 328
 329         4. In the except clause, call traceback.print_exc()
 330
 331 For example:
 332
 333         import sys
 334         import traceback
 335         print "Content-type: text/html"
 336         print
 337         sys.stderr = sys.stdout
 338         try:
 339                 ...your code here...
 340         except:
 341                 print "\n\n<PRE>"
 342                 traceback.print_exc()
 343
 344 Notes: The assignment to sys.stderr is needed because the traceback
 345 prints to sys.stderr.  The print "\n\n<PRE>" statement is necessary to
 346 disable the word wrapping in HTML.
 347
 348 If you suspect that there may be a problem in importing the traceback
 349 module, you can use an even more robust approach (which only uses
 350 built-in modules):
 351
 352         import sys
 353         sys.stderr = sys.stdout
 354         print "Content-type: text/plain"
 355         print
 356         ...your code here...
 357
 358 This relies on the Python interpreter to print the traceback.  The
 359 content type of the output is set to plain text, which disables all
 360 HTML processing.  If your script works, the raw HTML will be displayed
 361 by your client.  If it raises an exception, most likely after the
 362 first two lines have been printed, a traceback will be displayed.
 363 Because no HTML interpretation is going on, the traceback will
 364 readable.
 365
 366 When all else fails, you may want to insert calls to log() to your
 367 program or even to a copy of the cgi.py file.  Note that this requires
 368 you to set cgi.logfile to the name of a world-writable file before the
 369 first call to log() is made!
 370
 371 Good luck!
 372
 373
 374 Common problems and solutions
 375 -----------------------------
 376
 377 - Most HTTP servers buffer the output from CGI scripts until the
 378 script is completed.  This means that it is not possible to display a
 379 progress report on the client's display while the script is running.
 380
 381 - Check the installation instructions above.
 382
 383 - Check the HTTP server's log files.  ("tail -f logfile" in a separate
 384 window may be useful!)
 385
 386 - Always check a script for syntax errors first, by doing something
 387 like "python script.py".
 388
 389 - When using any of the debugging techniques, don't forget to add
 390 "import sys" to the top of the script.
 391
 392 - When invoking external programs, make sure they can be found.
 393 Usually, this means using absolute path names -- $PATH is usually not
 394 set to a very useful value in a CGI script.
 395
 396 - When reading or writing external files, make sure they can be read
 397 or written by every user on the system.
 398
 399 - Don't try to give a CGI script a set-uid mode.  This doesn't work on
 400 most systems, and is a security liability as well.
 401
 402
 403 History
 404 -------
 405
 406 Michael McLay started this module.  Steve Majewski changed the
 407 interface to SvFormContentDict and FormContentDict.  The multipart
 408 parsing was inspired by code submitted by Andreas Paepcke.  Guido van
 409 Rossum rewrote, reformatted and documented the module and is currently
 410 responsible for its maintenance.
 411
 412
 413 XXX The module is getting pretty heavy with all those docstrings.
 414 Perhaps there should be a slimmed version that doesn't contain all those
 415 backwards compatible and debugging classes and functions?
 416
 417 """
 418
 419 __version__ = "2.2"
 420
 421
 422 # Imports
 423 # =======
 424
 425 import string
 426 import sys
 427 import os
 428 import urllib
 429 import mimetools
 430 import rfc822
 431 from StringIO import StringIO
 432
 433
 434 # Logging support
 435 # ===============
 436
 437 logfile = ""            # Filename to log to, if not empty
 438 logfp = None            # File object to log to, if not None
 439
 440 def initlog(*allargs):
 441     """Write a log message, if there is a log file.
 442
 443     Even though this function is called initlog(), you should always
 444     use log(); log is a variable that is set either to initlog
 445     (initially), to dolog (once the log file has been opened), or to
 446     nolog (when logging is disabled).
 447
 448     The first argument is a format string; the remaining arguments (if
 449     any) are arguments to the % operator, so e.g.
 450         log("%s: %s", "a", "b")
 451     will write "a: b" to the log file, followed by a newline.
 452
 453     If the global logfp is not None, it should be a file object to
 454     which log data is written.
 455
 456     If the global logfp is None, the global logfile may be a string
 457     giving a filename to open, in append mode.  This file should be
 458     world writable!!!  If the file can't be opened, logging is
 459     silently disabled (since there is no safe place where we could
 460     send an error message).
 461
 462     """
 463     global logfp, log
 464     if logfile and not logfp:
 465         try:
 466             logfp = open(logfile, "a")
 467         except IOError:
 468             pass
 469     if not logfp:
 470         log = nolog
 471     else:
 472         log = dolog
 473     apply(log, allargs)
 474
 475 def dolog(fmt, *args):
 476     """Write a log message to the log file.  See initlog() for docs."""
 477     logfp.write(fmt%args + "\n")
 478
 479 def nolog(*allargs):
 480     """Dummy function, assigned to log when logging is disabled."""
 481     pass
 482
 483 log = initlog           # The current logging function
 484
 485
 486 # Parsing functions
 487 # =================
 488
 489 # Maximum input we will accept when REQUEST_METHOD is POST
 490 # 0 ==> unlimited input
 491 maxlen = 0
 492
 493 def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
 494     """Parse a query in the environment or from a file (default stdin)
 495
 496         Arguments, all optional:
 497
 498         fp              : file pointer; default: sys.stdin
 499
 500         environ         : environment dictionary; default: os.environ
 501
 502         keep_blank_values: flag indicating whether blank values in
 503             URL encoded forms should be treated as blank strings.
 504             A true value inicates that blanks should be retained as
 505             blank strings.  The default false value indicates that
 506             blank values are to be ignored and treated as if they were
 507             not included.
 508
 509         strict_parsing: flag indicating what to do with parsing errors.
 510             If false (the default), errors are silently ignored.
 511             If true, errors raise a ValueError exception.
 512     """
 513     if not fp:
 514         fp = sys.stdin
 515     if not environ.has_key('REQUEST_METHOD'):
 516         environ['REQUEST_METHOD'] = 'GET'       # For testing stand-alone
 517     if environ['REQUEST_METHOD'] == 'POST':
 518         ctype, pdict = parse_header(environ['CONTENT_TYPE'])
 519         if ctype == 'multipart/form-data':
 520             return parse_multipart(fp, pdict)
 521         elif ctype == 'application/x-www-form-urlencoded':
 522             clength = string.atoi(environ['CONTENT_LENGTH'])
 523             if maxlen and clength > maxlen:
 524                 raise ValueError, 'Maximum content length exceeded'
 525             qs = fp.read(clength)
 526         else:
 527             qs = ''                     # Unknown content-type
 528         if environ.has_key('QUERY_STRING'):
 529             if qs: qs = qs + '&'
 530             qs = qs + environ['QUERY_STRING']
 531         elif sys.argv[1:]:
 532             if qs: qs = qs + '&'
 533             qs = qs + sys.argv[1]
 534         environ['QUERY_STRING'] = qs    # XXX Shouldn't, really
 535     elif environ.has_key('QUERY_STRING'):
 536         qs = environ['QUERY_STRING']
 537     else:
 538         if sys.argv[1:]:
 539             qs = sys.argv[1]
 540         else:
 541             qs = ""
 542         environ['QUERY_STRING'] = qs    # XXX Shouldn't, really
 543     return parse_qs(qs, keep_blank_values, strict_parsing)
 544
 545
 546 def parse_qs(qs, keep_blank_values=0, strict_parsing=0):
 547     """Parse a query given as a string argument.
 548
 549         Arguments:
 550
 551         qs: URL-encoded query string to be parsed
 552
 553         keep_blank_values: flag indicating whether blank values in
 554             URL encoded queries should be treated as blank strings.
 555             A true value inicates that blanks should be retained as
 556             blank strings.  The default false value indicates that
 557             blank values are to be ignored and treated as if they were
 558             not included.
 559
 560         strict_parsing: flag indicating what to do with parsing errors.
 561             If false (the default), errors are silently ignored.
 562             If true, errors raise a ValueError exception.
 563     """
 564     dict = {}
 565     for name, value in parse_qsl(qs, keep_blank_values, strict_parsing):
 566         if len(value) or keep_blank_values:
 567             if dict.has_key(name):
 568                 dict[name].append(value)
 569             else:
 570                 dict[name] = [value]
 571     return dict
 572
 573 def parse_qsl(qs, keep_blank_values=0, strict_parsing=0):
 574     """Parse a query given as a string argument.
 575
 576         Arguments:
 577
 578         qs: URL-encoded query string to be parsed
 579
 580         keep_blank_values: flag indicating whether blank values in
 581             URL encoded queries should be treated as blank strings.
 582             A true value inicates that blanks should be retained as
 583             blank strings.  The default false value indicates that
 584             blank values are to be ignored and treated as if they were
 585             not included.
 586
 587         strict_parsing: flag indicating what to do with parsing errors.
 588             If false (the default), errors are silently ignored.
 589             If true, errors raise a ValueError exception.
 590
 591        Returns a list, as God intended.
 592     """
 593     name_value_pairs = string.splitfields(qs, '&')
 594     r=[]
 595     for name_value in name_value_pairs:
 596         nv = string.splitfields(name_value, '=')
 597         if len(nv) != 2:
 598             if strict_parsing:
 599                 raise ValueError, "bad query field: %s" % `name_value`
 600             continue
 601         name = urllib.unquote(string.replace(nv[0], '+', ' '))
 602         value = urllib.unquote(string.replace(nv[1], '+', ' '))
 603         r.append(name, value)
 604
 605     return r
 606
 607
 608 def parse_multipart(fp, pdict):
 609     """Parse multipart input.
 610
 611     Arguments:
 612     fp   : input file
 613     pdict: dictionary containing other parameters of conten-type header
 614
 615     Returns a dictionary just like parse_qs(): keys are the field names, each
 616     value is a list of values for that field.  This is easy to use but not
 617     much good if you are expecting megabytes to be uploaded -- in that case,
 618     use the FieldStorage class instead which is much more flexible.  Note
 619     that content-type is the raw, unparsed contents of the content-type
 620     header.
 621
 622     XXX This does not parse nested multipart parts -- use FieldStorage for
 623     that.
 624
 625     XXX This should really be subsumed by FieldStorage altogether -- no
 626     point in having two implementations of the same parsing algorithm.
 627
 628     """
 629     if pdict.has_key('boundary'):
 630         boundary = pdict['boundary']
 631     else:
 632         boundary = ""
 633     nextpart = "--" + boundary
 634     lastpart = "--" + boundary + "--"
 635     partdict = {}
 636     terminator = ""
 637
 638     while terminator != lastpart:
 639         bytes = -1
 640         data = None
 641         if terminator:
 642             # At start of next part.  Read headers first.
 643             headers = mimetools.Message(fp)
 644             clength = headers.getheader('content-length')
 645             if clength:
 646                 try:
 647                     bytes = string.atoi(clength)
 648                 except string.atoi_error:
 649                     pass
 650             if bytes > 0:
 651                 if maxlen and bytes > maxlen:
 652                     raise ValueError, 'Maximum content length exceeded'
 653                 data = fp.read(bytes)
 654             else:
 655                 data = ""
 656         # Read lines until end of part.
 657         lines = []
 658         while 1:
 659             line = fp.readline()
 660             if not line:
 661                 terminator = lastpart # End outer loop
 662                 break
 663             if line[:2] == "--":
 664                 terminator = string.strip(line)
 665                 if terminator in (nextpart, lastpart):
 666                     break
 667             lines.append(line)
 668         # Done with part.
 669         if data is None:
 670             continue
 671         if bytes < 0:
 672             if lines:
 673                 # Strip final line terminator
 674                 line = lines[-1]
 675                 if line[-2:] == "\r\n":
 676                     line = line[:-2]
 677                 elif line[-1:] == "\n":
 678                     line = line[:-1]
 679                 lines[-1] = line
 680                 data = string.joinfields(lines, "")
 681         line = headers['content-disposition']
 682         if not line:
 683             continue
 684         key, params = parse_header(line)
 685         if key != 'form-data':
 686             continue
 687         if params.has_key('name'):
 688             name = params['name']
 689         else:
 690             continue
 691         if partdict.has_key(name):
 692             partdict[name].append(data)
 693         else:
 694             partdict[name] = [data]
 695
 696     return partdict
 697
 698
 699 def parse_header(line):
 700     """Parse a Content-type like header.
 701
 702     Return the main content-type and a dictionary of options.
 703
 704     """
 705     plist = map(string.strip, string.splitfields(line, ';'))
 706     key = string.lower(plist[0])
 707     del plist[0]
 708     pdict = {}
 709     for p in plist:
 710         i = string.find(p, '=')
 711         if i >= 0:
 712             name = string.lower(string.strip(p[:i]))
 713             value = string.strip(p[i+1:])
 714             if len(value) >= 2 and value[0] == value[-1] == '"':
 715                 value = value[1:-1]
 716             pdict[name] = value
 717     return key, pdict
 718
 719
 720 # Classes for field storage
 721 # =========================
 722
 723 class MiniFieldStorage:
 724
 725     """Like FieldStorage, for use when no file uploads are possible."""
 726
 727     # Dummy attributes
 728     filename = None
 729     list = None
 730     type = None
 731     file = None
 732     type_options = {}
 733     disposition = None
 734     disposition_options = {}
 735     headers = {}
 736
 737     def __init__(self, name, value):
 738         """Constructor from field name and value."""
 739         self.name = name
 740         self.value = value
 741         # self.file = StringIO(value)
 742
 743     def __repr__(self):
 744         """Return printable representation."""
 745         return "MiniFieldStorage(%s, %s)" % (`self.name`, `self.value`)
 746
 747
 748 class FieldStorage:
 749
 750     """Store a sequence of fields, reading multipart/form-data.
 751
 752     This class provides naming, typing, files stored on disk, and
 753     more.  At the top level, it is accessible like a dictionary, whose
 754     keys are the field names.  (Note: None can occur as a field name.)
 755     The items are either a Python list (if there's multiple values) or
 756     another FieldStorage or MiniFieldStorage object.  If it's a single
 757     object, it has the following attributes:
 758
 759     name: the field name, if specified; otherwise None
 760
 761     filename: the filename, if specified; otherwise None; this is the
 762         client side filename, *not* the file name on which it is
 763         stored (that's a temporary file you don't deal with)
 764
 765     value: the value as a *string*; for file uploads, this
 766         transparently reads the file every time you request the value
 767
 768     file: the file(-like) object from which you can read the data;
 769         None if the data is stored a simple string
 770
 771     type: the content-type, or None if not specified
 772
 773     type_options: dictionary of options specified on the content-type
 774         line
 775
 776     disposition: content-disposition, or None if not specified
 777
 778     disposition_options: dictionary of corresponding options
 779
 780     headers: a dictionary(-like) object (sometimes rfc822.Message or a
 781         subclass thereof) containing *all* headers
 782
 783     The class is subclassable, mostly for the purpose of overriding
 784     the make_file() method, which is called internally to come up with
 785     a file open for reading and writing.  This makes it possible to
 786     override the default choice of storing all files in a temporary
 787     directory and unlinking them as soon as they have been opened.
 788
 789     """
 790
 791     def __init__(self, fp=None, headers=None, outerboundary="",
 792                  environ=os.environ, keep_blank_values=0, strict_parsing=0):
 793         """Constructor.  Read multipart/* until last part.
 794
 795         Arguments, all optional:
 796
 797         fp              : file pointer; default: sys.stdin
 798             (not used when the request method is GET)
 799
 800         headers         : header dictionary-like object; default:
 801             taken from environ as per CGI spec
 802
 803         outerboundary   : terminating multipart boundary
 804             (for internal use only)
 805
 806         environ         : environment dictionary; default: os.environ
 807
 808         keep_blank_values: flag indicating whether blank values in
 809             URL encoded forms should be treated as blank strings.
 810             A true value inicates that blanks should be retained as
 811             blank strings.  The default false value indicates that
 812             blank values are to be ignored and treated as if they were
 813             not included.
 814
 815         strict_parsing: flag indicating what to do with parsing errors.
 816             If false (the default), errors are silently ignored.
 817             If true, errors raise a ValueError exception.
 818
 819         """
 820         method = 'GET'
 821         self.keep_blank_values = keep_blank_values
 822         self.strict_parsing = strict_parsing
 823         if environ.has_key('REQUEST_METHOD'):
 824             method = string.upper(environ['REQUEST_METHOD'])
 825         if method == 'GET' or method == 'HEAD':
 826             if environ.has_key('QUERY_STRING'):
 827                 qs = environ['QUERY_STRING']
 828             elif sys.argv[1:]:
 829                 qs = sys.argv[1]
 830             else:
 831                 qs = ""
 832             fp = StringIO(qs)
 833             if headers is None:
 834                 headers = {'content-type':
 835                            "application/x-www-form-urlencoded"}
 836         if headers is None:
 837             headers = {}
 838             if method == 'POST':
 839                 # Set default content-type for POST to what's traditional
 840                 headers['content-type'] = "application/x-www-form-urlencoded"
 841             if environ.has_key('CONTENT_TYPE'):
 842                 headers['content-type'] = environ['CONTENT_TYPE']
 843             if environ.has_key('CONTENT_LENGTH'):
 844                 headers['content-length'] = environ['CONTENT_LENGTH']
 845         self.fp = fp or sys.stdin
 846         self.headers = headers
 847         self.outerboundary = outerboundary
 848
 849         # Process content-disposition header
 850         cdisp, pdict = "", {}
 851         if self.headers.has_key('content-disposition'):
 852             cdisp, pdict = parse_header(self.headers['content-disposition'])
 853         self.disposition = cdisp
 854         self.disposition_options = pdict
 855         self.name = None
 856         if pdict.has_key('name'):
 857             self.name = pdict['name']
 858         self.filename = None
 859         if pdict.has_key('filename'):
 860             self.filename = pdict['filename']
 861
 862         # Process content-type header
 863         #
 864         # Honor any existing content-type header.  But if there is no
 865         # content-type header, use some sensible defaults.  Assume
 866         # outerboundary is "" at the outer level, but something non-false
 867         # inside a multi-part.  The default for an inner part is text/plain,
 868         # but for an outer part it should be urlencoded.  This should catch
 869         # bogus clients which erroneously forget to include a content-type
 870         # header.
 871         #
 872         # See below for what we do if there does exist a content-type header,
 873         # but it happens to be something we don't understand.
 874         if self.headers.has_key('content-type'):
 875             ctype, pdict = parse_header(self.headers['content-type'])
 876         elif self.outerboundary or method != 'POST':
 877             ctype, pdict = "text/plain", {}
 878         else:
 879             ctype, pdict = 'application/x-www-form-urlencoded', {}
 880         self.type = ctype
 881         self.type_options = pdict
 882         self.innerboundary = ""
 883         if pdict.has_key('boundary'):
 884             self.innerboundary = pdict['boundary']
 885         clen = -1
 886         if self.headers.has_key('content-length'):
 887             try:
 888                 clen = string.atoi(self.headers['content-length'])
 889             except:
 890                 pass
 891             if maxlen and clen > maxlen:
 892                 raise ValueError, 'Maximum content length exceeded'
 893         self.length = clen
 894
 895         self.list = self.file = None
 896         self.done = 0
 897         self.lines = []
 898         if ctype == 'application/x-www-form-urlencoded':
 899             self.read_urlencoded()
 900         elif ctype[:10] == 'multipart/':
 901             self.read_multi(environ, keep_blank_values, strict_parsing)
 902         else:
 903             self.read_single()
 904
 905     def __repr__(self):
 906         """Return a printable representation."""
 907         return "FieldStorage(%s, %s, %s)" % (
 908                 `self.name`, `self.filename`, `self.value`)
 909
 910     def __getattr__(self, name):
 911         if name != 'value':
 912             raise AttributeError, name
 913         if self.file:
 914             self.file.seek(0)
 915             value = self.file.read()
 916             self.file.seek(0)
 917         elif self.list is not None:
 918             value = self.list
 919         else:
 920             value = None
 921         return value
 922
 923     def __getitem__(self, key):
 924         """Dictionary style indexing."""
 925         if self.list is None:
 926             raise TypeError, "not indexable"
 927         found = []
 928         for item in self.list:
 929             if item.name == key: found.append(item)
 930         if not found:
 931             raise KeyError, key
 932         if len(found) == 1:
 933             return found[0]
 934         else:
 935             return found
 936
 937     def keys(self):
 938         """Dictionary style keys() method."""
 939         if self.list is None:
 940             raise TypeError, "not indexable"
 941         keys = []
 942         for item in self.list:
 943             if item.name not in keys: keys.append(item.name)
 944         return keys
 945
 946     def has_key(self, key):
 947         """Dictionary style has_key() method."""
 948         if self.list is None:
 949             raise TypeError, "not indexable"
 950         for item in self.list:
 951             if item.name == key: return 1
 952         return 0
 953
 954     def __len__(self):
 955         """Dictionary style len(x) support."""
 956         return len(self.keys())
 957
 958     def read_urlencoded(self):
 959         """Internal: read data in query string format."""
 960         qs = self.fp.read(self.length)
 961         self.list = list = []
 962         for key, value in parse_qsl(qs, self.keep_blank_values,
 963                                     self.strict_parsing):
 964             list.append(MiniFieldStorage(key, value))
 965         self.skip_lines()
 966
 967     FieldStorageClass = None
 968
 969     def read_multi(self, environ, keep_blank_values, strict_parsing):
 970         """Internal: read a part that is itself multipart."""
 971         self.list = []
 972         klass = self.FieldStorageClass or self.__class__
 973         part = klass(self.fp, {}, self.innerboundary,
 974                      environ, keep_blank_values, strict_parsing)
 975         # Throw first part away
 976         while not part.done:
 977             headers = rfc822.Message(self.fp)
 978             part = klass(self.fp, headers, self.innerboundary,
 979                          environ, keep_blank_values, strict_parsing)
 980             self.list.append(part)
 981         self.skip_lines()
 982
 983     def read_single(self):
 984         """Internal: read an atomic part."""
 985         if self.length >= 0:
 986             self.read_binary()
 987             self.skip_lines()
 988         else:
 989             self.read_lines()
 990         self.file.seek(0)
 991
 992     bufsize = 8*1024            # I/O buffering size for copy to file
 993
 994     def read_binary(self):
 995         """Internal: read binary data."""
 996         self.file = self.make_file('b')
 997         todo = self.length
 998         if todo >= 0:
 999             while todo > 0:
1000                 data = self.fp.read(min(todo, self.bufsize))
1001                 if not data:
1002                     self.done = -1
1003                     break
1004                 self.file.write(data)
1005                 todo = todo - len(data)
1006
1007     def read_lines(self):
1008         """Internal: read lines until EOF or outerboundary."""
1009         self.file = self.make_file('')
1010         if self.outerboundary:
1011             self.read_lines_to_outerboundary()
1012         else:
1013             self.read_lines_to_eof()
1014
1015     def read_lines_to_eof(self):
1016         """Internal: read lines until EOF."""
1017         while 1:
1018             line = self.fp.readline()
1019             if not line:
1020                 self.done = -1
1021                 break
1022             self.lines.append(line)
1023             self.file.write(line)
1024
1025     def read_lines_to_outerboundary(self):
1026         """Internal: read lines until outerboundary."""
1027         next = "--" + self.outerboundary
1028         last = next + "--"
1029         delim = ""
1030         while 1:
1031             line = self.fp.readline()
1032             if not line:
1033                 self.done = -1
1034                 break
1035             self.lines.append(line)
1036             if line[:2] == "--":
1037                 strippedline = string.strip(line)
1038                 if strippedline == next:
1039                     break
1040                 if strippedline == last:
1041                     self.done = 1
1042                     break
1043             odelim = delim
1044             if line[-2:] == "\r\n":
1045                 delim = "\r\n"
1046                 line = line[:-2]
1047             elif line[-1] == "\n":
1048                 delim = "\n"
1049                 line = line[:-1]
1050             else:
1051                 delim = ""
1052             self.file.write(odelim + line)
1053
1054     def skip_lines(self):
1055         """Internal: skip lines until outer boundary if defined."""
1056         if not self.outerboundary or self.done:
1057             return
1058         next = "--" + self.outerboundary
1059         last = next + "--"
1060         while 1:
1061             line = self.fp.readline()
1062             if not line:
1063                 self.done = -1
1064                 break
1065             self.lines.append(line)
1066             if line[:2] == "--":
1067                 strippedline = string.strip(line)
1068                 if strippedline == next:
1069                     break
1070                 if strippedline == last:
1071                     self.done = 1
1072                     break
1073
1074     def make_file(self, binary=None):
1075         """Overridable: return a readable & writable file.
1076
1077         The file will be used as follows:
1078         - data is written to it
1079         - seek(0)
1080         - data is read from it
1081
1082         The 'binary' argument is unused -- the file is always opened
1083         in binary mode.
1084
1085         This version opens a temporary file for reading and writing,
1086         and immediately deletes (unlinks) it.  The trick (on Unix!) is
1087         that the file can still be used, but it can't be opened by
1088         another process, and it will automatically be deleted when it
1089         is closed or when the current process terminates.
1090
1091         If you want a more permanent file, you derive a class which
1092         overrides this method.  If you want a visible temporary file
1093         that is nevertheless automatically deleted when the script
1094         terminates, try defining a __del__ method in a derived class
1095         which unlinks the temporary files you have created.
1096
1097         """
1098         import tempfile
1099         return tempfile.TemporaryFile("w+b")
1100
1101
1102
1103 # Backwards Compatibility Classes
1104 # ===============================
1105
1106 class FormContentDict:
1107     """Basic (multiple values per field) form content as dictionary.
1108
1109     form = FormContentDict()
1110
1111     form[key] -> [value, value, ...]
1112     form.has_key(key) -> Boolean
1113     form.keys() -> [key, key, ...]
1114     form.values() -> [[val, val, ...], [val, val, ...], ...]
1115     form.items() ->  [(key, [val, val, ...]), (key, [val, val, ...]), ...]
1116     form.dict == {key: [val, val, ...], ...}
1117
1118     """
1119     def __init__(self, environ=os.environ):
1120         self.dict = parse(environ=environ)
1121         self.query_string = environ['QUERY_STRING']
1122     def __getitem__(self,key):
1123         return self.dict[key]
1124     def keys(self):
1125         return self.dict.keys()
1126     def has_key(self, key):
1127         return self.dict.has_key(key)
1128     def values(self):
1129         return self.dict.values()
1130     def items(self):
1131         return self.dict.items()
1132     def __len__( self ):
1133         return len(self.dict)
1134
1135
1136 class SvFormContentDict(FormContentDict):
1137     """Strict single-value expecting form content as dictionary.
1138
1139     IF you only expect a single value for each field, then form[key]
1140     will return that single value.  It will raise an IndexError if
1141     that expectation is not true.  IF you expect a field to have
1142     possible multiple values, than you can use form.getlist(key) to
1143     get all of the values.  values() and items() are a compromise:
1144     they return single strings where there is a single value, and
1145     lists of strings otherwise.
1146
1147     """
1148     def __getitem__(self, key):
1149         if len(self.dict[key]) > 1:
1150             raise IndexError, 'expecting a single value'
1151         return self.dict[key][0]
1152     def getlist(self, key):
1153         return self.dict[key]
1154     def values(self):
1155         lis = []
1156         for each in self.dict.values():
1157             if len( each ) == 1 :
1158                 lis.append(each[0])
1159             else: lis.append(each)
1160         return lis
1161     def items(self):
1162         lis = []
1163         for key,value in self.dict.items():
1164             if len(value) == 1 :
1165                 lis.append((key, value[0]))
1166             else:       lis.append((key, value))
1167         return lis
1168
1169
1170 class InterpFormContentDict(SvFormContentDict):
1171     """This class is present for backwards compatibility only."""
1172     def __getitem__( self, key ):
1173         v = SvFormContentDict.__getitem__( self, key )
1174         if v[0] in string.digits+'+-.' :
1175             try:  return  string.atoi( v )
1176             except ValueError:
1177                 try:    return string.atof( v )
1178                 except ValueError: pass
1179         return string.strip(v)
1180     def values( self ):
1181         lis = []
1182         for key in self.keys():
1183             try:
1184                 lis.append( self[key] )
1185             except IndexError:
1186                 lis.append( self.dict[key] )
1187         return lis
1188     def items( self ):
1189         lis = []
1190         for key in self.keys():
1191             try:
1192                 lis.append( (key, self[key]) )
1193             except IndexError:
1194                 lis.append( (key, self.dict[key]) )
1195         return lis
1196
1197
1198 class FormContent(FormContentDict):
1199     """This class is present for backwards compatibility only."""
1200     def values(self, key):
1201         if self.dict.has_key(key) :return self.dict[key]
1202         else: return None
1203     def indexed_value(self, key, location):
1204         if self.dict.has_key(key):
1205             if len (self.dict[key]) > location:
1206                 return self.dict[key][location]
1207             else: return None
1208         else: return None
1209     def value(self, key):
1210         if self.dict.has_key(key): return self.dict[key][0]
1211         else: return None
1212     def length(self, key):
1213         return len(self.dict[key])
1214     def stripped(self, key):
1215         if self.dict.has_key(key): return string.strip(self.dict[key][0])
1216         else: return None
1217     def pars(self):
1218         return self.dict
1219
1220
1221 # Test/debug code
1222 # ===============
1223
1224 def test(environ=os.environ):
1225     """Robust test CGI script, usable as main program.
1226
1227     Write minimal HTTP headers and dump all information provided to
1228     the script in HTML form.
1229
1230     """
1231     import traceback
1232     print "Content-type: text/html"
1233     print
1234     sys.stderr = sys.stdout
1235     try:
1236         form = FieldStorage()   # Replace with other classes to test those
1237         print_form(form)
1238         print_environ(environ)
1239         print_directory()
1240         print_arguments()
1241         print_environ_usage()
1242         def f():
1243             exec "testing print_exception() -- <I>italics?</I>"
1244         def g(f=f):
1245             f()
1246         print "<H3>What follows is a test, not an actual exception:</H3>"
1247         g()
1248     except:
1249         print_exception()
1250
1251     # Second try with a small maxlen...
1252     global maxlen
1253     maxlen = 50
1254     try:
1255         form = FieldStorage()   # Replace with other classes to test those
1256         print_form(form)
1257         print_environ(environ)
1258         print_directory()
1259         print_arguments()
1260         print_environ_usage()
1261     except:
1262         print_exception()
1263
1264 def print_exception(type=None, value=None, tb=None, limit=None):
1265     if type is None:
1266         type, value, tb = sys.exc_info()
1267     import traceback
1268     print
1269     print "<H3>Traceback (innermost last):</H3>"
1270     list = traceback.format_tb(tb, limit) + \
1271            traceback.format_exception_only(type, value)
1272     print "<PRE>%s<B>%s</B></PRE>" % (
1273         escape(string.join(list[:-1], "")),
1274         escape(list[-1]),
1275         )
1276     del tb
1277
1278 def print_environ(environ=os.environ):
1279     """Dump the shell environment as HTML."""
1280     keys = environ.keys()
1281     keys.sort()
1282     print
1283     print "<H3>Shell Environment:</H3>"
1284     print "<DL>"
1285     for key in keys:
1286         print "<DT>", escape(key), "<DD>", escape(environ[key])
1287     print "</DL>"
1288     print
1289
1290 def print_form(form):
1291     """Dump the contents of a form as HTML."""
1292     keys = form.keys()
1293     keys.sort()
1294     print
1295     print "<H3>Form Contents:</H3>"
1296     print "<DL>"
1297     for key in keys:
1298         print "<DT>" + escape(key) + ":",
1299         value = form[key]
1300         print "<i>" + escape(`type(value)`) + "</i>"
1301         print "<DD>" + escape(`value`)
1302     print "</DL>"
1303     print
1304
1305 def print_directory():
1306     """Dump the current directory as HTML."""
1307     print
1308     print "<H3>Current Working Directory:</H3>"
1309     try:
1310         pwd = os.getcwd()
1311     except os.error, msg:
1312         print "os.error:", escape(str(msg))
1313     else:
1314         print escape(pwd)
1315     print
1316
1317 def print_arguments():
1318     print
1319     print "<H3>Command Line Arguments:</H3>"
1320     print
1321     print sys.argv
1322     print
1323
1324 def print_environ_usage():
1325     """Dump a list of environment variables used by CGI as HTML."""
1326     print """
1327 <H3>These environment variables could have been set:</H3>
1328 <UL>
1329 <LI>AUTH_TYPE
1330 <LI>CONTENT_LENGTH
1331 <LI>CONTENT_TYPE
1332 <LI>DATE_GMT
1333 <LI>DATE_LOCAL
1334 <LI>DOCUMENT_NAME
1335 <LI>DOCUMENT_ROOT
1336 <LI>DOCUMENT_URI
1337 <LI>GATEWAY_INTERFACE
1338 <LI>LAST_MODIFIED
1339 <LI>PATH
1340 <LI>PATH_INFO
1341 <LI>PATH_TRANSLATED
1342 <LI>QUERY_STRING
1343 <LI>REMOTE_ADDR
1344 <LI>REMOTE_HOST
1345 <LI>REMOTE_IDENT
1346 <LI>REMOTE_USER
1347 <LI>REQUEST_METHOD
1348 <LI>SCRIPT_NAME
1349 <LI>SERVER_NAME
1350 <LI>SERVER_PORT
1351 <LI>SERVER_PROTOCOL
1352 <LI>SERVER_ROOT
1353 <LI>SERVER_SOFTWARE
1354 </UL>
1355 In addition, HTTP headers sent by the server may be passed in the
1356 environment as well.  Here are some common variable names:
1357 <UL>
1358 <LI>HTTP_ACCEPT
1359 <LI>HTTP_CONNECTION
1360 <LI>HTTP_HOST
1361 <LI>HTTP_PRAGMA
1362 <LI>HTTP_REFERER
1363 <LI>HTTP_USER_AGENT
1364 </UL>
1365 """
1366
1367
1368 # Utilities
1369 # =========
1370
1371 def escape(s, quote=None):
1372     """Replace special characters '&', '<' and '>' by SGML entities."""
1373     s = string.replace(s, "&", "&amp;") # Must be done first!
1374     s = string.replace(s, "<", "&lt;")
1375     s = string.replace(s, ">", "&gt;",)
1376     if quote:
1377         s = string.replace(s, '"', "&quot;")
1378     return s
1379
1380
1381 # Invoke mainline
1382 # ===============
1383
1384 # Call test() when this file is run as a script (not imported as a module)
1385 if __name__ == '__main__':
1386     test()