Lib/cgi.py

   1 #! /usr/local/bin/python
   2
   3 """Support module for CGI (Common Gateway Interface) scripts.
   4
   5 This module defines a number of utilities for use by CGI scripts
   6 written in Python.
   7
   8
   9 Introduction
  10 ------------
  11
  12 A CGI script is invoked by an HTTP server, usually to process user
  13 input submitted through an HTML <FORM> or <ISINPUT> element.
  14
  15 Most often, CGI scripts live in the server's special cgi-bin
  16 directory.  The HTTP server places all sorts of information about the
  17 request (such as the client's hostname, the requested URL, the query
  18 string, and lots of other goodies) in the script's shell environment,
  19 executes the script, and sends the script's output back to the client.
  20
  21 The script's input is connected to the client too, and sometimes the
  22 form data is read this way; at other times the form data is passed via
  23 the "query string" part of the URL.  This module (cgi.py) is intended
  24 to take care of the different cases and provide a simpler interface to
  25 the Python script.  It also provides a number of utilities that help
  26 in debugging scripts, and the latest addition is support for file
  27 uploads from a form (if your browser supports it -- Grail 0.3 and
  28 Netscape 2.0 do).
  29
  30 The output of a CGI script should consist of two sections, separated
  31 by a blank line.  The first section contains a number of headers,
  32 telling the client what kind of data is following.  Python code to
  33 generate a minimal header section looks like this:
  34
  35         print "Content-type: text/html" # HTML is following
  36         print                           # blank line, end of headers
  37
  38 The second section is usually HTML, which allows the client software
  39 to display nicely formatted text with header, in-line images, etc.
  40 Here's Python code that prints a simple piece of HTML:
  41
  42         print "<TITLE>CGI script output</TITLE>"
  43         print "<H1>This is my first CGI script</H1>"
  44         print "Hello, world!"
  45
  46 It may not be fully legal HTML according to the letter of the
  47 standard, but any browser will understand it.
  48
  49
  50 Using the cgi module
  51 --------------------
  52
  53 Begin by writing "import cgi".  Don't use "from cgi import *" -- the
  54 module defines all sorts of names for its own use or for backward
  55 compatibility that you don't want in your namespace.
  56
  57 It's best to use the FieldStorage class.  The other classes define in this
  58 module are provided mostly for backward compatibility.  Instantiate it
  59 exactly once, without arguments.  This reads the form contents from
  60 standard input or the environment (depending on the value of various
  61 environment variables set according to the CGI standard).  Since it may
  62 consume standard input, it should be instantiated only once.
  63
  64 The FieldStorage instance can be accessed as if it were a Python
  65 dictionary.  For instance, the following code (which assumes that the
  66 Content-type header and blank line have already been printed) checks that
  67 the fields "name" and "addr" are both set to a non-empty string:
  68
  69         form = cgi.FieldStorage()
  70         form_ok = 0
  71         if form.has_key("name") and form.has_key("addr"):
  72                 if form["name"].value != "" and form["addr"].value != "":
  73                         form_ok = 1
  74         if not form_ok:
  75                 print "<H1>Error</H1>"
  76                 print "Please fill in the name and addr fields."
  77                 return
  78         ...further form processing here...
  79
  80 Here the fields, accessed through form[key], are themselves instances
  81 of FieldStorage (or MiniFieldStorage, depending on the form encoding).
  82
  83 If the submitted form data contains more than one field with the same
  84 name, the object retrieved by form[key] is not a (Mini)FieldStorage
  85 instance but a list of such instances.  If you are expecting this
  86 possibility (i.e., when your HTML form comtains multiple fields with
  87 the same name), use the type() function to determine whether you have
  88 a single instance or a list of instances.  For example, here's code
  89 that concatenates any number of username fields, separated by commas:
  90
  91         username = form["username"]
  92         if type(username) is type([]):
  93                 # Multiple username fields specified
  94                 usernames = ""
  95                 for item in username:
  96                         if usernames:
  97                                 # Next item -- insert comma
  98                                 usernames = usernames + "," + item.value
  99                         else:
 100                                 # First item -- don't insert comma
 101                                 usernames = item.value
 102         else:
 103                 # Single username field specified
 104                 usernames = username.value
 105
 106 If a field represents an uploaded file, the value attribute reads the
 107 entire file in memory as a string.  This may not be what you want.  You can
 108 test for an uploaded file by testing either the filename attribute or the
 109 file attribute.  You can then read the data at leasure from the file
 110 attribute:
 111
 112         fileitem = form["userfile"]
 113         if fileitem.file:
 114                 # It's an uploaded file; count lines
 115                 linecount = 0
 116                 while 1:
 117                         line = fileitem.file.readline()
 118                         if not line: break
 119                         linecount = linecount + 1
 120
 121 The file upload draft standard entertains the possibility of uploading
 122 multiple files from one field (using a recursive multipart/*
 123 encoding).  When this occurs, the item will be a dictionary-like
 124 FieldStorage item.  This can be determined by testing its type
 125 attribute, which should have the value "multipart/form-data" (or
 126 perhaps another string beginning with "multipart/").  It this case, it
 127 can be iterated over recursively just like the top-level form object.
 128
 129 When a form is submitted in the "old" format (as the query string or as a
 130 single data part of type application/x-www-form-urlencoded), the items
 131 will actually be instances of the class MiniFieldStorage.  In this case,
 132 the list, file and filename attributes are always None.
 133
 134
 135 Old classes
 136 -----------
 137
 138 These classes, present in earlier versions of the cgi module, are still
 139 supported for backward compatibility.  New applications should use the
 140 FieldStorage class.
 141
 142 SvFormContentDict: single value form content as dictionary; assumes each
 143 field name occurs in the form only once.
 144
 145 FormContentDict: multiple value form content as dictionary (the form
 146 items are lists of values).  Useful if your form contains multiple
 147 fields with the same name.
 148
 149 Other classes (FormContent, InterpFormContentDict) are present for
 150 backwards compatibility with really old applications only.  If you still
 151 use these and would be inconvenienced when they disappeared from a next
 152 version of this module, drop me a note.
 153
 154
 155 Functions
 156 ---------
 157
 158 These are useful if you want more control, or if you want to employ
 159 some of the algorithms implemented in this module in other
 160 circumstances.
 161
 162 parse(fp, [environ, [keep_blank_values, [strict_parsing]]]): parse a
 163 form into a Python dictionary.
 164
 165 parse_qs(qs, [keep_blank_values, [strict_parsing]]): parse a query
 166 string (data of type application/x-www-form-urlencoded).
 167
 168 parse_multipart(fp, pdict): parse input of type multipart/form-data (for
 169 file uploads).
 170
 171 parse_header(string): parse a header like Content-type into a main
 172 value and a dictionary of parameters.
 173
 174 test(): complete test program.
 175
 176 print_environ(): format the shell environment in HTML.
 177
 178 print_form(form): format a form in HTML.
 179
 180 print_environ_usage(): print a list of useful environment variables in
 181 HTML.
 182
 183 escape(): convert the characters "&", "<" and ">" to HTML-safe
 184 sequences.  Use this if you need to display text that might contain
 185 such characters in HTML.  To translate URLs for inclusion in the HREF
 186 attribute of an <A> tag, use urllib.quote().
 187
 188 log(fmt, ...): write a line to a log file; see docs for initlog().
 189
 190
 191 Caring about security
 192 ---------------------
 193
 194 There's one important rule: if you invoke an external program (e.g.
 195 via the os.system() or os.popen() functions), make very sure you don't
 196 pass arbitrary strings received from the client to the shell.  This is
 197 a well-known security hole whereby clever hackers anywhere on the web
 198 can exploit a gullible CGI script to invoke arbitrary shell commands.
 199 Even parts of the URL or field names cannot be trusted, since the
 200 request doesn't have to come from your form!
 201
 202 To be on the safe side, if you must pass a string gotten from a form
 203 to a shell command, you should make sure the string contains only
 204 alphanumeric characters, dashes, underscores, and periods.
 205
 206
 207 Installing your CGI script on a Unix system
 208 -------------------------------------------
 209
 210 Read the documentation for your HTTP server and check with your local
 211 system administrator to find the directory where CGI scripts should be
 212 installed; usually this is in a directory cgi-bin in the server tree.
 213
 214 Make sure that your script is readable and executable by "others"; the
 215 Unix file mode should be 755 (use "chmod 755 filename").  Make sure
 216 that the first line of the script contains #! starting in column 1
 217 followed by the pathname of the Python interpreter, for instance:
 218
 219         #! /usr/local/bin/python
 220
 221 Make sure the Python interpreter exists and is executable by "others".
 222
 223 Note that it's probably not a good idea to use #! /usr/bin/env python
 224 here, since the Python interpreter may not be on the default path
 225 given to CGI scripts!!!
 226
 227 Make sure that any files your script needs to read or write are
 228 readable or writable, respectively, by "others" -- their mode should
 229 be 644 for readable and 666 for writable.  This is because, for
 230 security reasons, the HTTP server executes your script as user
 231 "nobody", without any special privileges.  It can only read (write,
 232 execute) files that everybody can read (write, execute).  The current
 233 directory at execution time is also different (it is usually the
 234 server's cgi-bin directory) and the set of environment variables is
 235 also different from what you get at login.  in particular, don't count
 236 on the shell's search path for executables ($PATH) or the Python
 237 module search path ($PYTHONPATH) to be set to anything interesting.
 238
 239 If you need to load modules from a directory which is not on Python's
 240 default module search path, you can change the path in your script,
 241 before importing other modules, e.g.:
 242
 243         import sys
 244         sys.path.insert(0, "/usr/home/joe/lib/python")
 245         sys.path.insert(0, "/usr/local/lib/python")
 246
 247 This way, the directory inserted last will be searched first!
 248
 249 Instructions for non-Unix systems will vary; check your HTTP server's
 250 documentation (it will usually have a section on CGI scripts).
 251
 252
 253 Testing your CGI script
 254 -----------------------
 255
 256 Unfortunately, a CGI script will generally not run when you try it
 257 from the command line, and a script that works perfectly from the
 258 command line may fail mysteriously when run from the server.  There's
 259 one reason why you should still test your script from the command
 260 line: if it contains a syntax error, the python interpreter won't
 261 execute it at all, and the HTTP server will most likely send a cryptic
 262 error to the client.
 263
 264 Assuming your script has no syntax errors, yet it does not work, you
 265 have no choice but to read the next section:
 266
 267
 268 Debugging CGI scripts
 269 ---------------------
 270
 271 First of all, check for trivial installation errors -- reading the
 272 section above on installing your CGI script carefully can save you a
 273 lot of time.  If you wonder whether you have understood the
 274 installation procedure correctly, try installing a copy of this module
 275 file (cgi.py) as a CGI script.  When invoked as a script, the file
 276 will dump its environment and the contents of the form in HTML form.
 277 Give it the right mode etc, and send it a request.  If it's installed
 278 in the standard cgi-bin directory, it should be possible to send it a
 279 request by entering a URL into your browser of the form:
 280
 281         http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
 282
 283 If this gives an error of type 404, the server cannot find the script
 284 -- perhaps you need to install it in a different directory.  If it
 285 gives another error (e.g.  500), there's an installation problem that
 286 you should fix before trying to go any further.  If you get a nicely
 287 formatted listing of the environment and form content (in this
 288 example, the fields should be listed as "addr" with value "At Home"
 289 and "name" with value "Joe Blow"), the cgi.py script has been
 290 installed correctly.  If you follow the same procedure for your own
 291 script, you should now be able to debug it.
 292
 293 The next step could be to call the cgi module's test() function from
 294 your script: replace its main code with the single statement
 295
 296         cgi.test()
 297
 298 This should produce the same results as those gotten from installing
 299 the cgi.py file itself.
 300
 301 When an ordinary Python script raises an unhandled exception (e.g.,
 302 because of a typo in a module name, a file that can't be opened,
 303 etc.), the Python interpreter prints a nice traceback and exits.
 304 While the Python interpreter will still do this when your CGI script
 305 raises an exception, most likely the traceback will end up in one of
 306 the HTTP server's log file, or be discarded altogether.
 307
 308 Fortunately, once you have managed to get your script to execute
 309 *some* code, it is easy to catch exceptions and cause a traceback to
 310 be printed.  The test() function below in this module is an example.
 311 Here are the rules:
 312
 313         1. Import the traceback module (before entering the
 314            try-except!)
 315
 316         2. Make sure you finish printing the headers and the blank
 317            line early
 318
 319         3. Assign sys.stderr to sys.stdout
 320
 321         3. Wrap all remaining code in a try-except statement
 322
 323         4. In the except clause, call traceback.print_exc()
 324
 325 For example:
 326
 327         import sys
 328         import traceback
 329         print "Content-type: text/html"
 330         print
 331         sys.stderr = sys.stdout
 332         try:
 333                 ...your code here...
 334         except:
 335                 print "\n\n<PRE>"
 336                 traceback.print_exc()
 337
 338 Notes: The assignment to sys.stderr is needed because the traceback
 339 prints to sys.stderr.  The print "\n\n<PRE>" statement is necessary to
 340 disable the word wrapping in HTML.
 341
 342 If you suspect that there may be a problem in importing the traceback
 343 module, you can use an even more robust approach (which only uses
 344 built-in modules):
 345
 346         import sys
 347         sys.stderr = sys.stdout
 348         print "Content-type: text/plain"
 349         print
 350         ...your code here...
 351
 352 This relies on the Python interpreter to print the traceback.  The
 353 content type of the output is set to plain text, which disables all
 354 HTML processing.  If your script works, the raw HTML will be displayed
 355 by your client.  If it raises an exception, most likely after the
 356 first two lines have been printed, a traceback will be displayed.
 357 Because no HTML interpretation is going on, the traceback will
 358 readable.
 359
 360 When all else fails, you may want to insert calls to log() to your
 361 program or even to a copy of the cgi.py file.  Note that this requires
 362 you to set cgi.logfile to the name of a world-writable file before the
 363 first call to log() is made!
 364
 365 Good luck!
 366
 367
 368 Common problems and solutions
 369 -----------------------------
 370
 371 - Most HTTP servers buffer the output from CGI scripts until the
 372 script is completed.  This means that it is not possible to display a
 373 progress report on the client's display while the script is running.
 374
 375 - Check the installation instructions above.
 376
 377 - Check the HTTP server's log files.  ("tail -f logfile" in a separate
 378 window may be useful!)
 379
 380 - Always check a script for syntax errors first, by doing something
 381 like "python script.py".
 382
 383 - When using any of the debugging techniques, don't forget to add
 384 "import sys" to the top of the script.
 385
 386 - When invoking external programs, make sure they can be found.
 387 Usually, this means using absolute path names -- $PATH is usually not
 388 set to a very useful value in a CGI script.
 389
 390 - When reading or writing external files, make sure they can be read
 391 or written by every user on the system.
 392
 393 - Don't try to give a CGI script a set-uid mode.  This doesn't work on
 394 most systems, and is a security liability as well.
 395
 396
 397 History
 398 -------
 399
 400 Michael McLay started this module.  Steve Majewski changed the
 401 interface to SvFormContentDict and FormContentDict.  The multipart
 402 parsing was inspired by code submitted by Andreas Paepcke.  Guido van
 403 Rossum rewrote, reformatted and documented the module and is currently
 404 responsible for its maintenance.
 405
 406
 407 XXX The module is getting pretty heavy with all those docstrings.
 408 Perhaps there should be a slimmed version that doesn't contain all those
 409 backwards compatible and debugging classes and functions?
 410
 411 """
 412
 413 __version__ = "2.2"
 414
 415
 416 # Imports
 417 # =======
 418
 419 import string
 420 import sys
 421 import os
 422 import urllib
 423 import mimetools
 424 import rfc822
 425 from StringIO import StringIO
 426
 427
 428 # Logging support
 429 # ===============
 430
 431 logfile = ""            # Filename to log to, if not empty
 432 logfp = None            # File object to log to, if not None
 433
 434 def initlog(*allargs):
 435     """Write a log message, if there is a log file.
 436
 437     Even though this function is called initlog(), you should always
 438     use log(); log is a variable that is set either to initlog
 439     (initially), to dolog (once the log file has been opened), or to
 440     nolog (when logging is disabled).
 441
 442     The first argument is a format string; the remaining arguments (if
 443     any) are arguments to the % operator, so e.g.
 444         log("%s: %s", "a", "b")
 445     will write "a: b" to the log file, followed by a newline.
 446
 447     If the global logfp is not None, it should be a file object to
 448     which log data is written.
 449
 450     If the global logfp is None, the global logfile may be a string
 451     giving a filename to open, in append mode.  This file should be
 452     world writable!!!  If the file can't be opened, logging is
 453     silently disabled (since there is no safe place where we could
 454     send an error message).
 455
 456     """
 457     global logfp, log
 458     if logfile and not logfp:
 459         try:
 460             logfp = open(logfile, "a")
 461         except IOError:
 462             pass
 463     if not logfp:
 464         log = nolog
 465     else:
 466         log = dolog
 467     apply(log, allargs)
 468
 469 def dolog(fmt, *args):
 470     """Write a log message to the log file.  See initlog() for docs."""
 471     logfp.write(fmt%args + "\n")
 472
 473 def nolog(*allargs):
 474     """Dummy function, assigned to log when logging is disabled."""
 475     pass
 476
 477 log = initlog           # The current logging function
 478
 479
 480 # Parsing functions
 481 # =================
 482
 483 # Maximum input we will accept when REQUEST_METHOD is POST
 484 # 0 ==> unlimited input
 485 maxlen = 0
 486
 487 def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
 488     """Parse a query in the environment or from a file (default stdin)
 489
 490         Arguments, all optional:
 491
 492         fp              : file pointer; default: sys.stdin
 493
 494         environ         : environment dictionary; default: os.environ
 495
 496         keep_blank_values: flag indicating whether blank values in
 497             URL encoded forms should be treated as blank strings.
 498             A true value inicates that blanks should be retained as
 499             blank strings.  The default false value indicates that
 500             blank values are to be ignored and treated as if they were
 501             not included.
 502
 503         strict_parsing: flag indicating what to do with parsing errors.
 504             If false (the default), errors are silently ignored.
 505             If true, errors raise a ValueError exception.
 506     """
 507     if not fp:
 508         fp = sys.stdin
 509     if not environ.has_key('REQUEST_METHOD'):
 510         environ['REQUEST_METHOD'] = 'GET'       # For testing stand-alone
 511     if environ['REQUEST_METHOD'] == 'POST':
 512         ctype, pdict = parse_header(environ['CONTENT_TYPE'])
 513         if ctype == 'multipart/form-data':
 514             return parse_multipart(fp, pdict)
 515         elif ctype == 'application/x-www-form-urlencoded':
 516             clength = string.atoi(environ['CONTENT_LENGTH'])
 517             if maxlen and clength > maxlen:
 518                 raise ValueError, 'Maximum content length exceeded'
 519             qs = fp.read(clength)
 520         else:
 521             qs = ''                     # Unknown content-type
 522         if environ.has_key('QUERY_STRING'):
 523             if qs: qs = qs + '&'
 524             qs = qs + environ['QUERY_STRING']
 525         elif sys.argv[1:]:
 526             if qs: qs = qs + '&'
 527             qs = qs + sys.argv[1]
 528         environ['QUERY_STRING'] = qs    # XXX Shouldn't, really
 529     elif environ.has_key('QUERY_STRING'):
 530         qs = environ['QUERY_STRING']
 531     else:
 532         if sys.argv[1:]:
 533             qs = sys.argv[1]
 534         else:
 535             qs = ""
 536         environ['QUERY_STRING'] = qs    # XXX Shouldn't, really
 537     return parse_qs(qs, keep_blank_values, strict_parsing)
 538
 539
 540 def parse_qs(qs, keep_blank_values=0, strict_parsing=0):
 541     """Parse a query given as a string argument.
 542
 543         Arguments:
 544
 545         qs: URL-encoded query string to be parsed
 546
 547         keep_blank_values: flag indicating whether blank values in
 548             URL encoded queries should be treated as blank strings.
 549             A true value inicates that blanks should be retained as
 550             blank strings.  The default false value indicates that
 551             blank values are to be ignored and treated as if they were
 552             not included.
 553
 554         strict_parsing: flag indicating what to do with parsing errors.
 555             If false (the default), errors are silently ignored.
 556             If true, errors raise a ValueError exception.
 557     """
 558     name_value_pairs = string.splitfields(qs, '&')
 559     dict = {}
 560     for name_value in name_value_pairs:
 561         nv = string.splitfields(name_value, '=')
 562         if len(nv) != 2:
 563             if strict_parsing:
 564                 raise ValueError, "bad query field: %s" % `name_value`
 565             continue
 566         name = urllib.unquote(string.replace(nv[0], '+', ' '))
 567         value = urllib.unquote(string.replace(nv[1], '+', ' '))
 568         if len(value) or keep_blank_values:
 569             if dict.has_key (name):
 570                 dict[name].append(value)
 571             else:
 572                 dict[name] = [value]
 573     return dict
 574
 575
 576 def parse_multipart(fp, pdict):
 577     """Parse multipart input.
 578
 579     Arguments:
 580     fp   : input file
 581     pdict: dictionary containing other parameters of conten-type header
 582
 583     Returns a dictionary just like parse_qs(): keys are the field names, each
 584     value is a list of values for that field.  This is easy to use but not
 585     much good if you are expecting megabytes to be uploaded -- in that case,
 586     use the FieldStorage class instead which is much more flexible.  Note
 587     that content-type is the raw, unparsed contents of the content-type
 588     header.
 589
 590     XXX This does not parse nested multipart parts -- use FieldStorage for
 591     that.
 592
 593     XXX This should really be subsumed by FieldStorage altogether -- no
 594     point in having two implementations of the same parsing algorithm.
 595
 596     """
 597     if pdict.has_key('boundary'):
 598         boundary = pdict['boundary']
 599     else:
 600         boundary = ""
 601     nextpart = "--" + boundary
 602     lastpart = "--" + boundary + "--"
 603     partdict = {}
 604     terminator = ""
 605
 606     while terminator != lastpart:
 607         bytes = -1
 608         data = None
 609         if terminator:
 610             # At start of next part.  Read headers first.
 611             headers = mimetools.Message(fp)
 612             clength = headers.getheader('content-length')
 613             if clength:
 614                 try:
 615                     bytes = string.atoi(clength)
 616                 except string.atoi_error:
 617                     pass
 618             if bytes > 0:
 619                 if maxlen and bytes > maxlen:
 620                     raise ValueError, 'Maximum content length exceeded'
 621                 data = fp.read(bytes)
 622             else:
 623                 data = ""
 624         # Read lines until end of part.
 625         lines = []
 626         while 1:
 627             line = fp.readline()
 628             if not line:
 629                 terminator = lastpart # End outer loop
 630                 break
 631             if line[:2] == "--":
 632                 terminator = string.strip(line)
 633                 if terminator in (nextpart, lastpart):
 634                     break
 635             lines.append(line)
 636         # Done with part.
 637         if data is None:
 638             continue
 639         if bytes < 0:
 640             if lines:
 641                 # Strip final line terminator
 642                 line = lines[-1]
 643                 if line[-2:] == "\r\n":
 644                     line = line[:-2]
 645                 elif line[-1:] == "\n":
 646                     line = line[:-1]
 647                 lines[-1] = line
 648                 data = string.joinfields(lines, "")
 649         line = headers['content-disposition']
 650         if not line:
 651             continue
 652         key, params = parse_header(line)
 653         if key != 'form-data':
 654             continue
 655         if params.has_key('name'):
 656             name = params['name']
 657         else:
 658             continue
 659         if partdict.has_key(name):
 660             partdict[name].append(data)
 661         else:
 662             partdict[name] = [data]
 663
 664     return partdict
 665
 666
 667 def parse_header(line):
 668     """Parse a Content-type like header.
 669
 670     Return the main content-type and a dictionary of options.
 671
 672     """
 673     plist = map(string.strip, string.splitfields(line, ';'))
 674     key = string.lower(plist[0])
 675     del plist[0]
 676     pdict = {}
 677     for p in plist:
 678         i = string.find(p, '=')
 679         if i >= 0:
 680             name = string.lower(string.strip(p[:i]))
 681             value = string.strip(p[i+1:])
 682             if len(value) >= 2 and value[0] == value[-1] == '"':
 683                 value = value[1:-1]
 684             pdict[name] = value
 685     return key, pdict
 686
 687
 688 # Classes for field storage
 689 # =========================
 690
 691 class MiniFieldStorage:
 692
 693     """Like FieldStorage, for use when no file uploads are possible."""
 694
 695     # Dummy attributes
 696     filename = None
 697     list = None
 698     type = None
 699     file = None
 700     type_options = {}
 701     disposition = None
 702     disposition_options = {}
 703     headers = {}
 704
 705     def __init__(self, name, value):
 706         """Constructor from field name and value."""
 707         self.name = name
 708         self.value = value
 709         # self.file = StringIO(value)
 710
 711     def __repr__(self):
 712         """Return printable representation."""
 713         return "MiniFieldStorage(%s, %s)" % (`self.name`, `self.value`)
 714
 715
 716 class FieldStorage:
 717
 718     """Store a sequence of fields, reading multipart/form-data.
 719
 720     This class provides naming, typing, files stored on disk, and
 721     more.  At the top level, it is accessible like a dictionary, whose
 722     keys are the field names.  (Note: None can occur as a field name.)
 723     The items are either a Python list (if there's multiple values) or
 724     another FieldStorage or MiniFieldStorage object.  If it's a single
 725     object, it has the following attributes:
 726
 727     name: the field name, if specified; otherwise None
 728
 729     filename: the filename, if specified; otherwise None; this is the
 730         client side filename, *not* the file name on which it is
 731         stored (that's a temporary file you don't deal with)
 732
 733     value: the value as a *string*; for file uploads, this
 734         transparently reads the file every time you request the value
 735
 736     file: the file(-like) object from which you can read the data;
 737         None if the data is stored a simple string
 738
 739     type: the content-type, or None if not specified
 740
 741     type_options: dictionary of options specified on the content-type
 742         line
 743
 744     disposition: content-disposition, or None if not specified
 745
 746     disposition_options: dictionary of corresponding options
 747
 748     headers: a dictionary(-like) object (sometimes rfc822.Message or a
 749         subclass thereof) containing *all* headers
 750
 751     The class is subclassable, mostly for the purpose of overriding
 752     the make_file() method, which is called internally to come up with
 753     a file open for reading and writing.  This makes it possible to
 754     override the default choice of storing all files in a temporary
 755     directory and unlinking them as soon as they have been opened.
 756
 757     """
 758
 759     def __init__(self, fp=None, headers=None, outerboundary="",
 760                  environ=os.environ, keep_blank_values=0, strict_parsing=0):
 761         """Constructor.  Read multipart/* until last part.
 762
 763         Arguments, all optional:
 764
 765         fp              : file pointer; default: sys.stdin
 766             (not used when the request method is GET)
 767
 768         headers         : header dictionary-like object; default:
 769             taken from environ as per CGI spec
 770
 771         outerboundary   : terminating multipart boundary
 772             (for internal use only)
 773
 774         environ         : environment dictionary; default: os.environ
 775
 776         keep_blank_values: flag indicating whether blank values in
 777             URL encoded forms should be treated as blank strings.
 778             A true value inicates that blanks should be retained as
 779             blank strings.  The default false value indicates that
 780             blank values are to be ignored and treated as if they were
 781             not included.
 782
 783         strict_parsing: flag indicating what to do with parsing errors.
 784             If false (the default), errors are silently ignored.
 785             If true, errors raise a ValueError exception.
 786
 787         """
 788         method = 'GET'
 789         self.keep_blank_values = keep_blank_values
 790         self.strict_parsing = strict_parsing
 791         if environ.has_key('REQUEST_METHOD'):
 792             method = string.upper(environ['REQUEST_METHOD'])
 793         if method == 'GET' or method == 'HEAD':
 794             if environ.has_key('QUERY_STRING'):
 795                 qs = environ['QUERY_STRING']
 796             elif sys.argv[1:]:
 797                 qs = sys.argv[1]
 798             else:
 799                 qs = ""
 800             fp = StringIO(qs)
 801             if headers is None:
 802                 headers = {'content-type':
 803                            "application/x-www-form-urlencoded"}
 804         if headers is None:
 805             headers = {}
 806             if method == 'POST':
 807                 # Set default content-type for POST to what's traditional
 808                 headers['content-type'] = "application/x-www-form-urlencoded"
 809             if environ.has_key('CONTENT_TYPE'):
 810                 headers['content-type'] = environ['CONTENT_TYPE']
 811             if environ.has_key('CONTENT_LENGTH'):
 812                 headers['content-length'] = environ['CONTENT_LENGTH']
 813         self.fp = fp or sys.stdin
 814         self.headers = headers
 815         self.outerboundary = outerboundary
 816
 817         # Process content-disposition header
 818         cdisp, pdict = "", {}
 819         if self.headers.has_key('content-disposition'):
 820             cdisp, pdict = parse_header(self.headers['content-disposition'])
 821         self.disposition = cdisp
 822         self.disposition_options = pdict
 823         self.name = None
 824         if pdict.has_key('name'):
 825             self.name = pdict['name']
 826         self.filename = None
 827         if pdict.has_key('filename'):
 828             self.filename = pdict['filename']
 829
 830         # Process content-type header
 831         ctype, pdict = "text/plain", {}
 832         if self.headers.has_key('content-type'):
 833             ctype, pdict = parse_header(self.headers['content-type'])
 834         self.type = ctype
 835         self.type_options = pdict
 836         self.innerboundary = ""
 837         if pdict.has_key('boundary'):
 838             self.innerboundary = pdict['boundary']
 839         clen = -1
 840         if self.headers.has_key('content-length'):
 841             try:
 842                 clen = string.atoi(self.headers['content-length'])
 843             except:
 844                 pass
 845             if maxlen and clen > maxlen:
 846                 raise ValueError, 'Maximum content length exceeded'
 847         self.length = clen
 848
 849         self.list = self.file = None
 850         self.done = 0
 851         self.lines = []
 852         if ctype == 'application/x-www-form-urlencoded':
 853             self.read_urlencoded()
 854         elif ctype[:10] == 'multipart/':
 855             self.read_multi()
 856         else:
 857             self.read_single()
 858
 859     def __repr__(self):
 860         """Return a printable representation."""
 861         return "FieldStorage(%s, %s, %s)" % (
 862                 `self.name`, `self.filename`, `self.value`)
 863
 864     def __getattr__(self, name):
 865         if name != 'value':
 866             raise AttributeError, name
 867         if self.file:
 868             self.file.seek(0)
 869             value = self.file.read()
 870             self.file.seek(0)
 871         elif self.list is not None:
 872             value = self.list
 873         else:
 874             value = None
 875         return value
 876
 877     def __getitem__(self, key):
 878         """Dictionary style indexing."""
 879         if self.list is None:
 880             raise TypeError, "not indexable"
 881         found = []
 882         for item in self.list:
 883             if item.name == key: found.append(item)
 884         if not found:
 885             raise KeyError, key
 886         if len(found) == 1:
 887             return found[0]
 888         else:
 889             return found
 890
 891     def keys(self):
 892         """Dictionary style keys() method."""
 893         if self.list is None:
 894             raise TypeError, "not indexable"
 895         keys = []
 896         for item in self.list:
 897             if item.name not in keys: keys.append(item.name)
 898         return keys
 899
 900     def has_key(self, key):
 901         """Dictionary style has_key() method."""
 902         if self.list is None:
 903             raise TypeError, "not indexable"
 904         for item in self.list:
 905             if item.name == key: return 1
 906         return 0
 907
 908     def __len__(self):
 909         """Dictionary style len(x) support."""
 910         return len(self.keys())
 911
 912     def read_urlencoded(self):
 913         """Internal: read data in query string format."""
 914         qs = self.fp.read(self.length)
 915         dict = parse_qs(qs, self.keep_blank_values, self.strict_parsing)
 916         self.list = []
 917         for key, valuelist in dict.items():
 918             for value in valuelist:
 919                 self.list.append(MiniFieldStorage(key, value))
 920         self.skip_lines()
 921
 922     def read_multi(self):
 923         """Internal: read a part that is itself multipart."""
 924         self.list = []
 925         part = self.__class__(self.fp, {}, self.innerboundary)
 926         # Throw first part away
 927         while not part.done:
 928             headers = rfc822.Message(self.fp)
 929             part = self.__class__(self.fp, headers, self.innerboundary)
 930             self.list.append(part)
 931         self.skip_lines()
 932
 933     def read_single(self):
 934         """Internal: read an atomic part."""
 935         if self.length >= 0:
 936             self.read_binary()
 937             self.skip_lines()
 938         else:
 939             self.read_lines()
 940         self.file.seek(0)
 941
 942     bufsize = 8*1024            # I/O buffering size for copy to file
 943
 944     def read_binary(self):
 945         """Internal: read binary data."""
 946         self.file = self.make_file('b')
 947         todo = self.length
 948         if todo >= 0:
 949             while todo > 0:
 950                 data = self.fp.read(min(todo, self.bufsize))
 951                 if not data:
 952                     self.done = -1
 953                     break
 954                 self.file.write(data)
 955                 todo = todo - len(data)
 956
 957     def read_lines(self):
 958         """Internal: read lines until EOF or outerboundary."""
 959         self.file = self.make_file('')
 960         if self.outerboundary:
 961             self.read_lines_to_outerboundary()
 962         else:
 963             self.read_lines_to_eof()
 964
 965     def read_lines_to_eof(self):
 966         """Internal: read lines until EOF."""
 967         while 1:
 968             line = self.fp.readline()
 969             if not line:
 970                 self.done = -1
 971                 break
 972             self.lines.append(line)
 973             self.file.write(line)
 974
 975     def read_lines_to_outerboundary(self):
 976         """Internal: read lines until outerboundary."""
 977         next = "--" + self.outerboundary
 978         last = next + "--"
 979         delim = ""
 980         while 1:
 981             line = self.fp.readline()
 982             if not line:
 983                 self.done = -1
 984                 break
 985             self.lines.append(line)
 986             if line[:2] == "--":
 987                 strippedline = string.strip(line)
 988                 if strippedline == next:
 989                     break
 990                 if strippedline == last:
 991                     self.done = 1
 992                     break
 993             odelim = delim
 994             if line[-2:] == "\r\n":
 995                 delim = "\r\n"
 996                 line = line[:-2]
 997             elif line[-1] == "\n":
 998                 delim = "\n"
 999                 line = line[:-1]
1000             else:
1001                 delim = ""
1002             self.file.write(odelim + line)
1003
1004     def skip_lines(self):
1005         """Internal: skip lines until outer boundary if defined."""
1006         if not self.outerboundary or self.done:
1007             return
1008         next = "--" + self.outerboundary
1009         last = next + "--"
1010         while 1:
1011             line = self.fp.readline()
1012             if not line:
1013                 self.done = -1
1014                 break
1015             self.lines.append(line)
1016             if line[:2] == "--":
1017                 strippedline = string.strip(line)
1018                 if strippedline == next:
1019                     break
1020                 if strippedline == last:
1021                     self.done = 1
1022                     break
1023
1024     def make_file(self, binary=None):
1025         """Overridable: return a readable & writable file.
1026
1027         The file will be used as follows:
1028         - data is written to it
1029         - seek(0)
1030         - data is read from it
1031
1032         The 'binary' argument is unused -- the file is always opened
1033         in binary mode.
1034
1035         This version opens a temporary file for reading and writing,
1036         and immediately deletes (unlinks) it.  The trick (on Unix!) is
1037         that the file can still be used, but it can't be opened by
1038         another process, and it will automatically be deleted when it
1039         is closed or when the current process terminates.
1040
1041         If you want a more permanent file, you derive a class which
1042         overrides this method.  If you want a visible temporary file
1043         that is nevertheless automatically deleted when the script
1044         terminates, try defining a __del__ method in a derived class
1045         which unlinks the temporary files you have created.
1046
1047         """
1048         import tempfile
1049         return tempfile.TemporaryFile("w+b")
1050
1051
1052
1053 # Backwards Compatibility Classes
1054 # ===============================
1055
1056 class FormContentDict:
1057     """Basic (multiple values per field) form content as dictionary.
1058
1059     form = FormContentDict()
1060
1061     form[key] -> [value, value, ...]
1062     form.has_key(key) -> Boolean
1063     form.keys() -> [key, key, ...]
1064     form.values() -> [[val, val, ...], [val, val, ...], ...]
1065     form.items() ->  [(key, [val, val, ...]), (key, [val, val, ...]), ...]
1066     form.dict == {key: [val, val, ...], ...}
1067
1068     """
1069     def __init__(self, environ=os.environ):
1070         self.dict = parse(environ=environ)
1071         self.query_string = environ['QUERY_STRING']
1072     def __getitem__(self,key):
1073         return self.dict[key]
1074     def keys(self):
1075         return self.dict.keys()
1076     def has_key(self, key):
1077         return self.dict.has_key(key)
1078     def values(self):
1079         return self.dict.values()
1080     def items(self):
1081         return self.dict.items()
1082     def __len__( self ):
1083         return len(self.dict)
1084
1085
1086 class SvFormContentDict(FormContentDict):
1087     """Strict single-value expecting form content as dictionary.
1088
1089     IF you only expect a single value for each field, then form[key]
1090     will return that single value.  It will raise an IndexError if
1091     that expectation is not true.  IF you expect a field to have
1092     possible multiple values, than you can use form.getlist(key) to
1093     get all of the values.  values() and items() are a compromise:
1094     they return single strings where there is a single value, and
1095     lists of strings otherwise.
1096
1097     """
1098     def __getitem__(self, key):
1099         if len(self.dict[key]) > 1:
1100             raise IndexError, 'expecting a single value'
1101         return self.dict[key][0]
1102     def getlist(self, key):
1103         return self.dict[key]
1104     def values(self):
1105         lis = []
1106         for each in self.dict.values():
1107             if len( each ) == 1 :
1108                 lis.append(each[0])
1109             else: lis.append(each)
1110         return lis
1111     def items(self):
1112         lis = []
1113         for key,value in self.dict.items():
1114             if len(value) == 1 :
1115                 lis.append((key, value[0]))
1116             else:       lis.append((key, value))
1117         return lis
1118
1119
1120 class InterpFormContentDict(SvFormContentDict):
1121     """This class is present for backwards compatibility only."""
1122     def __getitem__( self, key ):
1123         v = SvFormContentDict.__getitem__( self, key )
1124         if v[0] in string.digits+'+-.' :
1125             try:  return  string.atoi( v )
1126             except ValueError:
1127                 try:    return string.atof( v )
1128                 except ValueError: pass
1129         return string.strip(v)
1130     def values( self ):
1131         lis = []
1132         for key in self.keys():
1133             try:
1134                 lis.append( self[key] )
1135             except IndexError:
1136                 lis.append( self.dict[key] )
1137         return lis
1138     def items( self ):
1139         lis = []
1140         for key in self.keys():
1141             try:
1142                 lis.append( (key, self[key]) )
1143             except IndexError:
1144                 lis.append( (key, self.dict[key]) )
1145         return lis
1146
1147
1148 class FormContent(FormContentDict):
1149     """This class is present for backwards compatibility only."""
1150     def values(self, key):
1151         if self.dict.has_key(key) :return self.dict[key]
1152         else: return None
1153     def indexed_value(self, key, location):
1154         if self.dict.has_key(key):
1155             if len (self.dict[key]) > location:
1156                 return self.dict[key][location]
1157             else: return None
1158         else: return None
1159     def value(self, key):
1160         if self.dict.has_key(key): return self.dict[key][0]
1161         else: return None
1162     def length(self, key):
1163         return len(self.dict[key])
1164     def stripped(self, key):
1165         if self.dict.has_key(key): return string.strip(self.dict[key][0])
1166         else: return None
1167     def pars(self):
1168         return self.dict
1169
1170
1171 # Test/debug code
1172 # ===============
1173
1174 def test(environ=os.environ):
1175     """Robust test CGI script, usable as main program.
1176
1177     Write minimal HTTP headers and dump all information provided to
1178     the script in HTML form.
1179
1180     """
1181     import traceback
1182     print "Content-type: text/html"
1183     print
1184     sys.stderr = sys.stdout
1185     try:
1186         form = FieldStorage()   # Replace with other classes to test those
1187         print_form(form)
1188         print_environ(environ)
1189         print_directory()
1190         print_arguments()
1191         print_environ_usage()
1192         def f():
1193             exec "testing print_exception() -- <I>italics?</I>"
1194         def g(f=f):
1195             f()
1196         print "<H3>What follows is a test, not an actual exception:</H3>"
1197         g()
1198     except:
1199         print_exception()
1200
1201     # Second try with a small maxlen...
1202     global maxlen
1203     maxlen = 50
1204     try:
1205         form = FieldStorage()   # Replace with other classes to test those
1206         print_form(form)
1207         print_environ(environ)
1208         print_directory()
1209         print_arguments()
1210         print_environ_usage()
1211     except:
1212         print_exception()
1213
1214 def print_exception(type=None, value=None, tb=None, limit=None):
1215     if type is None:
1216         type, value, tb = sys.exc_info()
1217     import traceback
1218     print
1219     print "<H3>Traceback (innermost last):</H3>"
1220     list = traceback.format_tb(tb, limit) + \
1221            traceback.format_exception_only(type, value)
1222     print "<PRE>%s<B>%s</B></PRE>" % (
1223         escape(string.join(list[:-1], "")),
1224         escape(list[-1]),
1225         )
1226     del tb
1227
1228 def print_environ(environ=os.environ):
1229     """Dump the shell environment as HTML."""
1230     keys = environ.keys()
1231     keys.sort()
1232     print
1233     print "<H3>Shell Environment:</H3>"
1234     print "<DL>"
1235     for key in keys:
1236         print "<DT>", escape(key), "<DD>", escape(environ[key])
1237     print "</DL>"
1238     print
1239
1240 def print_form(form):
1241     """Dump the contents of a form as HTML."""
1242     keys = form.keys()
1243     keys.sort()
1244     print
1245     print "<H3>Form Contents:</H3>"
1246     print "<DL>"
1247     for key in keys:
1248         print "<DT>" + escape(key) + ":",
1249         value = form[key]
1250         print "<i>" + escape(`type(value)`) + "</i>"
1251         print "<DD>" + escape(`value`)
1252     print "</DL>"
1253     print
1254
1255 def print_directory():
1256     """Dump the current directory as HTML."""
1257     print
1258     print "<H3>Current Working Directory:</H3>"
1259     try:
1260         pwd = os.getcwd()
1261     except os.error, msg:
1262         print "os.error:", escape(str(msg))
1263     else:
1264         print escape(pwd)
1265     print
1266
1267 def print_arguments():
1268     print
1269     print "<H3>Command Line Arguments:</H3>"
1270     print
1271     print sys.argv
1272     print
1273
1274 def print_environ_usage():
1275     """Dump a list of environment variables used by CGI as HTML."""
1276     print """
1277 <H3>These environment variables could have been set:</H3>
1278 <UL>
1279 <LI>AUTH_TYPE
1280 <LI>CONTENT_LENGTH
1281 <LI>CONTENT_TYPE
1282 <LI>DATE_GMT
1283 <LI>DATE_LOCAL
1284 <LI>DOCUMENT_NAME
1285 <LI>DOCUMENT_ROOT
1286 <LI>DOCUMENT_URI
1287 <LI>GATEWAY_INTERFACE
1288 <LI>LAST_MODIFIED
1289 <LI>PATH
1290 <LI>PATH_INFO
1291 <LI>PATH_TRANSLATED
1292 <LI>QUERY_STRING
1293 <LI>REMOTE_ADDR
1294 <LI>REMOTE_HOST
1295 <LI>REMOTE_IDENT
1296 <LI>REMOTE_USER
1297 <LI>REQUEST_METHOD
1298 <LI>SCRIPT_NAME
1299 <LI>SERVER_NAME
1300 <LI>SERVER_PORT
1301 <LI>SERVER_PROTOCOL
1302 <LI>SERVER_ROOT
1303 <LI>SERVER_SOFTWARE
1304 </UL>
1305 In addition, HTTP headers sent by the server may be passed in the
1306 environment as well.  Here are some common variable names:
1307 <UL>
1308 <LI>HTTP_ACCEPT
1309 <LI>HTTP_CONNECTION
1310 <LI>HTTP_HOST
1311 <LI>HTTP_PRAGMA
1312 <LI>HTTP_REFERER
1313 <LI>HTTP_USER_AGENT
1314 </UL>
1315 """
1316
1317
1318 # Utilities
1319 # =========
1320
1321 def escape(s, quote=None):
1322     """Replace special characters '&', '<' and '>' by SGML entities."""
1323     s = string.replace(s, "&", "&amp;") # Must be done first!
1324     s = string.replace(s, "<", "&lt;")
1325     s = string.replace(s, ">", "&gt;",)
1326     if quote:
1327         s = string.replace(s, '"', "&quot;")
1328     return s
1329
1330
1331 # Invoke mainline
1332 # ===============
1333
1334 # Call test() when this file is run as a script (not imported as a module)
1335 if __name__ == '__main__':
1336     test()