Doc/tut.tex

   1 \documentstyle[twoside,11pt,myformat]{report}
   2
   3 \title{Python Tutorial}
   4
   5 \input{boilerplate}
   6
   7 \begin{document}
   8
   9 \pagenumbering{roman}
  10
  11 \maketitle
  12
  13 \input{copyright}
  14
  15 \begin{abstract}
  16
  17 \noindent
  18 Python is a simple, yet powerful programming language that bridges the
  19 gap between C and shell programming, and is thus ideally suited for
  20 ``throw-away programming''
  21 and rapid prototyping.  Its syntax is put
  22 together from constructs borrowed from a variety of other languages;
  23 most prominent are influences from ABC, C, Modula-3 and Icon.
  24
  25 The Python interpreter is easily extended with new functions and data
  26 types implemented in C.  Python is also suitable as an extension
  27 language for highly customizable C applications such as editors or
  28 window managers.
  29
  30 Python is available for various operating systems, amongst which
  31 several flavors of {\UNIX}, Amoeba, the Apple Macintosh O.S.,
  32 and MS-DOS.
  33
  34 This tutorial introduces the reader informally to the basic concepts
  35 and features of the Python language and system.  It helps to have a
  36 Python interpreter handy for hands-on experience, but as the examples
  37 are self-contained, the tutorial can be read off-line as well.
  38
  39 For a description of standard objects and modules, see the {\em Python
  40 Library Reference} document.  The {\em Python Reference Manual} gives
  41 a more formal definition of the language.
  42
  43 \end{abstract}
  44
  45 \pagebreak
  46 {
  47 \parskip = 0mm
  48 \tableofcontents
  49 }
  50
  51 \pagebreak
  52
  53 \pagenumbering{arabic}
  54
  55
  56 \chapter{Whetting Your Appetite}
  57
  58 If you ever wrote a large shell script, you probably know this
  59 feeling: you'd love to add yet another feature, but it's already so
  60 slow, and so big, and so complicated; or the feature involves a system
  61 call or other function that is only accessible from C \ldots  Usually
  62 the problem at hand isn't serious enough to warrant rewriting the
  63 script in C; perhaps because the problem requires variable-length
  64 strings or other data types (like sorted lists of file names) that are
  65 easy in the shell but lots of work to implement in C; or perhaps just
  66 because you're not sufficiently familiar with C.
  67
  68 In such cases, Python may be just the language for you.  Python is
  69 simple to use, but it is a real programming language, offering much
  70 more structure and support for large programs than the shell has.  On
  71 the other hand, it also offers much more error checking than C, and,
  72 being a {\em very-high-level language}, it has high-level data types
  73 built in, such as flexible arrays and dictionaries that would cost you
  74 days to implement efficiently in C.  Because of its more general data
  75 types Python is applicable to a much larger problem domain than {\em
  76 Awk} or even {\em Perl}, yet many things are at least as easy in
  77 Python as in those languages.
  78
  79 Python allows you to split up your program in modules that can be
  80 reused in other Python programs.  It comes with a large collection of
  81 standard modules that you can use as the basis of your programs --- or
  82 as examples to start learning to program in Python.  There are also
  83 built-in modules that provide things like file I/O, system calls,
  84 sockets, and even a generic interface to window systems (STDWIN).
  85
  86 Python is an interpreted language, which can save you considerable time
  87 during program development because no compilation and linking is
  88 necessary.  The interpreter can be used interactively, which makes it
  89 easy to experiment with features of the language, to write throw-away
  90 programs, or to test functions during bottom-up program development.
  91 It is also a handy desk calculator.
  92
  93 Python allows writing very compact and readable programs.  Programs
  94 written in Python are typically much shorter than equivalent C
  95 programs, for several reasons:
  96 \begin{itemize}
  97 \item
  98 the high-level data types allow you to express complex operations in a
  99 single statement;
 100 \item
 101 statement grouping is done by indentation instead of begin/end
 102 brackets;
 103 \item
 104 no variable or argument declarations are necessary.
 105 \end{itemize}
 106
 107 Python is {\em extensible}: if you know how to program in C it is easy
 108 to add a new built-in
 109 function or
 110 module to the interpreter, either to
 111 perform critical operations at maximum speed, or to link Python
 112 programs to libraries that may only be available in binary form (such
 113 as a vendor-specific graphics library).  Once you are really hooked,
 114 you can link the Python interpreter into an application written in C
 115 and use it as an extension or command language for that application.
 116
 117 By the way, the language is named after the BBC show ``Monty
 118 Python's Flying Circus'' and has nothing to do with nasty reptiles...
 119
 120 \section{Where From Here}
 121
 122 Now that you are all excited about Python, you'll want to examine it
 123 in some more detail.  Since the best way to learn a language is
 124 using it, you are invited here to do so.
 125
 126 In the next chapter, the mechanics of using the interpreter are
 127 explained.  This is rather mundane information, but essential for
 128 trying out the examples shown later.
 129
 130 The rest of the tutorial introduces various features of the Python
 131 language and system though examples, beginning with simple
 132 expressions, statements and data types, through functions and modules,
 133 and finally touching upon advanced concepts like exceptions
 134 and user-defined classes.
 135
 136 When you're through with the tutorial (or just getting bored), you
 137 should read the Library Reference, which gives complete (though terse)
 138 reference material about built-in and standard types, functions and
 139 modules that can save you a lot of time when writing Python programs.
 140
 141
 142 \chapter{Using the Python Interpreter}
 143
 144 \section{Invoking the Interpreter}
 145
 146 The Python interpreter is usually installed as {\tt /usr/local/bin/python}
 147 on those machines where it is available; putting {\tt /usr/local/bin} in
 148 your {\UNIX} shell's search path makes it possible to start it by
 149 typing the command
 150
 151 \bcode\begin{verbatim}
 152 python
 153 \end{verbatim}\ecode
 154 %
 155 to the shell.  Since the choice of the directory where the interpreter
 156 lives is an installation option, other places are possible; check with
 157 your local Python guru or system administrator.  (E.g., {\tt
 158 /usr/local/python} is a popular alternative location.)
 159
 160 The interpreter operates somewhat like the {\UNIX} shell: when called
 161 with standard input connected to a tty device, it reads and executes
 162 commands interactively; when called with a file name argument or with
 163 a file as standard input, it reads and executes a {\em script} from
 164 that file.
 165
 166 A third way of starting the interpreter is
 167 ``{\tt python -c command [arg] ...}'', which
 168 executes the statement(s) in {\tt command}, analogous to the shell's
 169 {\tt -c} option.  Since Python statements often contain spaces or other
 170 characters that are special to the shell, it is best to quote {\tt
 171 command} in its entirety with double quotes.
 172
 173 Note that there is a difference between ``{\tt python file}'' and
 174 ``{\tt python $<$file}''.  In the latter case, input requests from the
 175 program, such as calls to {\tt input()} and {\tt raw_input()}, are
 176 satisfied from {\em file}.  Since this file has already been read
 177 until the end by the parser before the program starts executing, the
 178 program will encounter EOF immediately.  In the former case (which is
 179 usually what you want) they are satisfied from whatever file or device
 180 is connected to standard input of the Python interpreter.
 181
 182 When a script file is used, it is sometimes useful to be able to run
 183 the script and enter interactive mode afterwards.  This can be done by
 184 passing {\tt -i} before the script.  (This does not work if the script
 185 is read from standard input, for the same reason as explained in the
 186 previous paragraph.)
 187
 188 \subsection{Argument Passing}
 189
 190 When known to the interpreter, the script name and additional
 191 arguments thereafter are passed to the script in the variable {\tt
 192 sys.argv}, which is a list of strings.  Its length is at least one;
 193 when no script and no arguments are given, {\tt sys.argv[0]} is an
 194 empty string.  When the script name is given as {\tt '-'} (meaning
 195 standard input), {\tt sys.argv[0]} is set to {\tt '-'}.  When {\tt -c
 196 command} is used, {\tt sys.argv[0]} is set to {\tt '-c'}.  Options
 197 found after {\tt -c command} are not consumed by the Python
 198 interpreter's option processing but left in {\tt sys.argv} for the
 199 command to handle.
 200
 201 \subsection{Interactive Mode}
 202
 203 When commands are read from a tty, the interpreter is said to be in
 204 {\em interactive\ mode}.  In this mode it prompts for the next command
 205 with the {\em primary\ prompt}, usually three greater-than signs ({\tt
 206 >>>}); for continuation lines it prompts with the
 207 {\em secondary\ prompt},
 208 by default three dots ({\tt ...}).  Typing an EOF (Control-D)
 209 at the primary prompt causes the interpreter to exit with a zero exit
 210 status.
 211
 212 The interpreter prints a welcome message stating its version number
 213 and a copyright notice before printing the first prompt, e.g.:
 214
 215 \bcode\begin{verbatim}
 216 python
 217 Python 1.3 (Oct 13 1995)
 218 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
 219 >>>
 220 \end{verbatim}\ecode
 221
 222 \section{The Interpreter and its Environment}
 223
 224 \subsection{Error Handling}
 225
 226 When an error occurs, the interpreter prints an error
 227 message and a stack trace.  In interactive mode, it then returns to
 228 the primary prompt; when input came from a file, it exits with a
 229 nonzero exit status after printing
 230 the stack trace.  (Exceptions handled by an {\tt except} clause in a
 231 {\tt try} statement are not errors in this context.)  Some errors are
 232 unconditionally fatal and cause an exit with a nonzero exit; this
 233 applies to internal inconsistencies and some cases of running out of
 234 memory.  All error messages are written to the standard error stream;
 235 normal output from the executed commands is written to standard
 236 output.
 237
 238 Typing the interrupt character (usually Control-C or DEL) to the
 239 primary or secondary prompt cancels the input and returns to the
 240 primary prompt.%
 241 \footnote{
 242         A problem with the GNU Readline package may prevent this.
 243 }
 244 Typing an interrupt while a command is executing raises the {\tt
 245 KeyboardInterrupt} exception, which may be handled by a {\tt try}
 246 statement.
 247
 248 \subsection{The Module Search Path}
 249
 250 When a module named {\tt spam} is imported, the interpreter searches
 251 for a file named {\tt spam.py} in the list of directories specified by
 252 the environment variable {\tt PYTHONPATH}.  It has the same syntax as
 253 the {\UNIX} shell variable {\tt PATH}, i.e., a list of colon-separated
 254 directory names.  When {\tt PYTHONPATH} is not set, or when the file
 255 is not found there, the search continues in an installation-dependent
 256 default path, usually {\tt .:/usr/local/lib/python}.
 257
 258 Actually, modules are searched in the list of directories given by the
 259 variable {\tt sys.path} which is initialized from {\tt PYTHONPATH} and
 260 the installation-dependent default.  This allows Python programs that
 261 know what they're doing to modify or replace the module search path.
 262 See the section on Standard Modules later.
 263
 264 \subsection{``Compiled'' Python files}
 265
 266 As an important speed-up of the start-up time for short programs that
 267 use a lot of standard modules, if a file called {\tt spam.pyc} exists
 268 in the directory where {\tt spam.py} is found, this is assumed to
 269 contain an already-``compiled'' version of the module {\tt spam}.  The
 270 modification time of the version of {\tt spam.py} used to create {\tt
 271 spam.pyc} is recorded in {\tt spam.pyc}, and the file is ignored if
 272 these don't match.
 273
 274 Whenever {\tt spam.py} is successfully compiled, an attempt is made to
 275 write the compiled version to {\tt spam.pyc}.  It is not an error if
 276 this attempt fails; if for any reason the file is not written
 277 completely, the resulting {\tt spam.pyc} file will be recognized as
 278 invalid and thus ignored later.
 279
 280 \subsection{Executable Python scripts}
 281
 282 On BSD'ish {\UNIX} systems, Python scripts can be made directly
 283 executable, like shell scripts, by putting the line
 284
 285 \bcode\begin{verbatim}
 286 #! /usr/local/bin/python
 287 \end{verbatim}\ecode
 288 %
 289 (assuming that's the name of the interpreter) at the beginning of the
 290 script and giving the file an executable mode.  The {\tt \#!} must be
 291 the first two characters of the file.
 292
 293 \subsection{The Interactive Startup File}
 294
 295 When you use Python interactively, it is frequently handy to have some
 296 standard commands executed every time the interpreter is started.  You
 297 can do this by setting an environment variable named {\tt
 298 PYTHONSTARTUP} to the name of a file containing your start-up
 299 commands.  This is similar to the {\tt .profile} feature of the UNIX
 300 shells.
 301
 302 This file is only read in interactive sessions, not when Python reads
 303 commands from a script, and not when {\tt /dev/tty} is given as the
 304 explicit source of commands (which otherwise behaves like an
 305 interactive session).  It is executed in the same name space where
 306 interactive commands are executed, so that objects that it defines or
 307 imports can be used without qualification in the interactive session.
 308 You can also change the prompts {\tt sys.ps1} and {\tt sys.ps2} in
 309 this file.
 310
 311 If you want to read an additional start-up file from the current
 312 directory, you can program this in the global start-up file, e.g.
 313 \verb\execfile('.pythonrc')\.  If you want to use the startup file
 314 in a script, you must write this explicitly in the script, e.g.
 315 \verb\import os;\ \verb\execfile(os.environ['PYTHONSTARTUP'])\.
 316
 317 \section{Interactive Input Editing and History Substitution}
 318
 319 Some versions of the Python interpreter support editing of the current
 320 input line and history substitution, similar to facilities found in
 321 the Korn shell and the GNU Bash shell.  This is implemented using the
 322 {\em GNU\ Readline} library, which supports Emacs-style and vi-style
 323 editing.  This library has its own documentation which I won't
 324 duplicate here; however, the basics are easily explained.
 325
 326 Perhaps the quickest check to see whether command line editing is
 327 supported is typing Control-P to the first Python prompt you get.  If
 328 it beeps, you have command line editing.  If nothing appears to
 329 happen, or if \verb/^P/ is echoed, you can skip the rest of this
 330 section.
 331
 332 \subsection{Line Editing}
 333
 334 If supported, input line editing is active whenever the interpreter
 335 prints a primary or secondary prompt.  The current line can be edited
 336 using the conventional Emacs control characters.  The most important
 337 of these are: C-A (Control-A) moves the cursor to the beginning of the
 338 line, C-E to the end, C-B moves it one position to the left, C-F to
 339 the right.  Backspace erases the character to the left of the cursor,
 340 C-D the character to its right.  C-K kills (erases) the rest of the
 341 line to the right of the cursor, C-Y yanks back the last killed
 342 string.  C-underscore undoes the last change you made; it can be
 343 repeated for cumulative effect.
 344
 345 \subsection{History Substitution}
 346
 347 History substitution works as follows.  All non-empty input lines
 348 issued are saved in a history buffer, and when a new prompt is given
 349 you are positioned on a new line at the bottom of this buffer.  C-P
 350 moves one line up (back) in the history buffer, C-N moves one down.
 351 Any line in the history buffer can be edited; an asterisk appears in
 352 front of the prompt to mark a line as modified.  Pressing the Return
 353 key passes the current line to the interpreter.  C-R starts an
 354 incremental reverse search; C-S starts a forward search.
 355
 356 \subsection{Key Bindings}
 357
 358 The key bindings and some other parameters of the Readline library can
 359 be customized by placing commands in an initialization file called
 360 {\tt \$HOME/.inputrc}.  Key bindings have the form
 361
 362 \bcode\begin{verbatim}
 363 key-name: function-name
 364 \end{verbatim}\ecode
 365 %
 366 or
 367
 368 \bcode\begin{verbatim}
 369 "string": function-name
 370 \end{verbatim}\ecode
 371 %
 372 and options can be set with
 373
 374 \bcode\begin{verbatim}
 375 set option-name value
 376 \end{verbatim}\ecode
 377 %
 378 For example:
 379
 380 \bcode\begin{verbatim}
 381 # I prefer vi-style editing:
 382 set editing-mode vi
 383 # Edit using a single line:
 384 set horizontal-scroll-mode On
 385 # Rebind some keys:
 386 Meta-h: backward-kill-word
 387 "\C-u": universal-argument
 388 "\C-x\C-r": re-read-init-file
 389 \end{verbatim}\ecode
 390 %
 391 Note that the default binding for TAB in Python is to insert a TAB
 392 instead of Readline's default filename completion function.  If you
 393 insist, you can override this by putting
 394
 395 \bcode\begin{verbatim}
 396 TAB: complete
 397 \end{verbatim}\ecode
 398 %
 399 in your {\tt \$HOME/.inputrc}.  (Of course, this makes it hard to type
 400 indented continuation lines...)
 401
 402 \subsection{Commentary}
 403
 404 This facility is an enormous step forward compared to previous
 405 versions of the interpreter; however, some wishes are left: It would
 406 be nice if the proper indentation were suggested on continuation lines
 407 (the parser knows if an indent token is required next).  The
 408 completion mechanism might use the interpreter's symbol table.  A
 409 command to check (or even suggest) matching parentheses, quotes etc.
 410 would also be useful.
 411
 412
 413 \chapter{An Informal Introduction to Python}
 414
 415 In the following examples, input and output are distinguished by the
 416 presence or absence of prompts ({\tt >>>} and {\tt ...}): to repeat
 417 the example, you must type everything after the prompt, when the
 418 prompt appears; lines that do not begin with a prompt are output from
 419 the interpreter.%
 420 \footnote{
 421         I'd prefer to use different fonts to distinguish input
 422         from output, but the amount of LaTeX hacking that would require
 423         is currently beyond my ability.
 424 }
 425 Note that a secondary prompt on a line by itself in an example means
 426 you must type a blank line; this is used to end a multi-line command.
 427
 428 \section{Using Python as a Calculator}
 429
 430 Let's try some simple Python commands.  Start the interpreter and wait
 431 for the primary prompt, {\tt >>>}.  (It shouldn't take long.)
 432
 433 \subsection{Numbers}
 434
 435 The interpreter acts as a simple calculator: you can type an
 436 expression at it and it will write the value.  Expression syntax is
 437 straightforward: the operators {\tt +}, {\tt -}, {\tt *} and {\tt /}
 438 work just like in most other languages (e.g., Pascal or C); parentheses
 439 can be used for grouping.  For example:
 440
 441 \bcode\begin{verbatim}
 442 >>> 2+2
 443 4
 444 >>> # This is a comment
 445 ... 2+2
 446 4
 447 >>> 2+2  # and a comment on the same line as code
 448 4
 449 >>> (50-5*6)/4
 450 5
 451 >>> # Integer division returns the floor:
 452 ... 7/3
 453 2
 454 >>> 7/-3
 455 -3
 456 >>>
 457 \end{verbatim}\ecode
 458 %
 459 Like in C, the equal sign ({\tt =}) is used to assign a value to a
 460 variable.  The value of an assignment is not written:
 461
 462 \bcode\begin{verbatim}
 463 >>> width = 20
 464 >>> height = 5*9
 465 >>> width * height
 466 900
 467 >>>
 468 \end{verbatim}\ecode
 469 %
 470 A value can be assigned to several variables simultaneously:
 471
 472 \bcode\begin{verbatim}
 473 >>> x = y = z = 0  # Zero x, y and z
 474 >>> x
 475 0
 476 >>> y
 477 0
 478 >>> z
 479 0
 480 >>>
 481 \end{verbatim}\ecode
 482 %
 483 There is full support for floating point; operators with mixed type
 484 operands convert the integer operand to floating point:
 485
 486 \bcode\begin{verbatim}
 487 >>> 4 * 2.5 / 3.3
 488 3.0303030303
 489 >>> 7.0 / 2
 490 3.5
 491 >>>
 492 \end{verbatim}\ecode
 493
 494 \subsection{Strings}
 495
 496 Besides numbers, Python can also manipulate strings, enclosed in
 497 single quotes or double quotes:
 498
 499 \bcode\begin{verbatim}
 500 >>> 'spam eggs'
 501 'spam eggs'
 502 >>> 'doesn\'t'
 503 "doesn't"
 504 >>> "doesn't"
 505 "doesn't"
 506 >>> '"Yes," he said.'
 507 '"Yes," he said.'
 508 >>> "\"Yes,\" he said."
 509 '"Yes," he said.'
 510 >>> '"Isn\'t," she said.'
 511 '"Isn\'t," she said.'
 512 >>>
 513 \end{verbatim}\ecode
 514 %
 515 Strings are written the same way as they are typed for input: inside
 516 quotes and with quotes and other funny characters escaped by backslashes,
 517 to show the precise value.  The string is enclosed in double quotes if
 518 the string contains a single quote and no double quotes, else it's
 519 enclosed in single quotes.  (The {\tt print} statement, described later,
 520 can be used to write strings without quotes or escapes.)
 521
 522 Strings can be concatenated (glued together) with the {\tt +}
 523 operator, and repeated with {\tt *}:
 524
 525 \bcode\begin{verbatim}
 526 >>> word = 'Help' + 'A'
 527 >>> word
 528 'HelpA'
 529 >>> '<' + word*5 + '>'
 530 '<HelpAHelpAHelpAHelpAHelpA>'
 531 >>>
 532 \end{verbatim}\ecode
 533 %
 534 Strings can be subscripted (indexed); like in C, the first character of
 535 a string has subscript (index) 0.
 536
 537 There is no separate character type; a character is simply a string of
 538 size one.  Like in Icon, substrings can be specified with the {\em
 539 slice} notation: two indices separated by a colon.
 540
 541 \bcode\begin{verbatim}
 542 >>> word[4]
 543 'A'
 544 >>> word[0:2]
 545 'He'
 546 >>> word[2:4]
 547 'lp'
 548 >>>
 549 \end{verbatim}\ecode
 550 %
 551 Slice indices have useful defaults; an omitted first index defaults to
 552 zero, an omitted second index defaults to the size of the string being
 553 sliced.
 554
 555 \bcode\begin{verbatim}
 556 >>> word[:2]    # The first two characters
 557 'He'
 558 >>> word[2:]    # All but the first two characters
 559 'lpA'
 560 >>>
 561 \end{verbatim}\ecode
 562 %
 563 Here's a useful invariant of slice operations: \verb\s[:i] + s[i:]\
 564 equals \verb\s\.
 565
 566 \bcode\begin{verbatim}
 567 >>> word[:2] + word[2:]
 568 'HelpA'
 569 >>> word[:3] + word[3:]
 570 'HelpA'
 571 >>>
 572 \end{verbatim}\ecode
 573 %
 574 Degenerate slice indices are handled gracefully: an index that is too
 575 large is replaced by the string size, an upper bound smaller than the
 576 lower bound returns an empty string.
 577
 578 \bcode\begin{verbatim}
 579 >>> word[1:100]
 580 'elpA'
 581 >>> word[10:]
 582 ''
 583 >>> word[2:1]
 584 ''
 585 >>>
 586 \end{verbatim}\ecode
 587 %
 588 Indices may be negative numbers, to start counting from the right.
 589 For example:
 590
 591 \bcode\begin{verbatim}
 592 >>> word[-1]     # The last character
 593 'A'
 594 >>> word[-2]     # The last-but-one character
 595 'p'
 596 >>> word[-2:]    # The last two characters
 597 'pA'
 598 >>> word[:-2]    # All but the last two characters
 599 'Hel'
 600 >>>
 601 \end{verbatim}\ecode
 602 %
 603 But note that -0 is really the same as 0, so it does not count from
 604 the right!
 605
 606 \bcode\begin{verbatim}
 607 >>> word[-0]     # (since -0 equals 0)
 608 'H'
 609 >>>
 610 \end{verbatim}\ecode
 611 %
 612 Out-of-range negative slice indices are truncated, but don't try this
 613 for single-element (non-slice) indices:
 614
 615 \bcode\begin{verbatim}
 616 >>> word[-100:]
 617 'HelpA'
 618 >>> word[-10]    # error
 619 Traceback (innermost last):
 620   File "<stdin>", line 1
 621 IndexError: string index out of range
 622 >>>
 623 \end{verbatim}\ecode
 624 %
 625 The best way to remember how slices work is to think of the indices as
 626 pointing {\em between} characters, with the left edge of the first
 627 character numbered 0.  Then the right edge of the last character of a
 628 string of {\tt n} characters has index {\tt n}, for example:
 629
 630 \bcode\begin{verbatim}
 631  +---+---+---+---+---+
 632  | H | e | l | p | A |
 633  +---+---+---+---+---+
 634  0   1   2   3   4   5
 635 -5  -4  -3  -2  -1
 636 \end{verbatim}\ecode
 637 %
 638 The first row of numbers gives the position of the indices 0...5 in
 639 the string; the second row gives the corresponding negative indices.
 640 The slice from \verb\i\ to \verb\j\ consists of all characters between
 641 the edges labeled \verb\i\ and \verb\j\, respectively.
 642
 643 For nonnegative indices, the length of a slice is the difference of
 644 the indices, if both are within bounds, e.g., the length of
 645 \verb\word[1:3]\ is 2.
 646
 647 The built-in function {\tt len()} returns the length of a string:
 648
 649 \bcode\begin{verbatim}
 650 >>> s = 'supercalifragilisticexpialidocious'
 651 >>> len(s)
 652 34
 653 >>>
 654 \end{verbatim}\ecode
 655
 656 \subsection{Lists}
 657
 658 Python knows a number of {\em compound} data types, used to group
 659 together other values.  The most versatile is the {\em list}, which
 660 can be written as a list of comma-separated values (items) between
 661 square brackets.  List items need not all have the same type.
 662
 663 \bcode\begin{verbatim}
 664 >>> a = ['spam', 'eggs', 100, 1234]
 665 >>> a
 666 ['spam', 'eggs', 100, 1234]
 667 >>>
 668 \end{verbatim}\ecode
 669 %
 670 Like string indices, list indices start at 0, and lists can be sliced,
 671 concatenated and so on:
 672
 673 \bcode\begin{verbatim}
 674 >>> a[0]
 675 'spam'
 676 >>> a[3]
 677 1234
 678 >>> a[-2]
 679 100
 680 >>> a[1:-1]
 681 ['eggs', 100]
 682 >>> a[:2] + ['bacon', 2*2]
 683 ['spam', 'eggs', 'bacon', 4]
 684 >>> 3*a[:3] + ['Boe!']
 685 ['spam', 'eggs', 100, 'spam', 'eggs', 100, 'spam', 'eggs', 100, 'Boe!']
 686 >>>
 687 \end{verbatim}\ecode
 688 %
 689 Unlike strings, which are {\em immutable}, it is possible to change
 690 individual elements of a list:
 691
 692 \bcode\begin{verbatim}
 693 >>> a
 694 ['spam', 'eggs', 100, 1234]
 695 >>> a[2] = a[2] + 23
 696 >>> a
 697 ['spam', 'eggs', 123, 1234]
 698 >>>
 699 \end{verbatim}\ecode
 700 %
 701 Assignment to slices is also possible, and this can even change the size
 702 of the list:
 703
 704 \bcode\begin{verbatim}
 705 >>> # Replace some items:
 706 ... a[0:2] = [1, 12]
 707 >>> a
 708 [1, 12, 123, 1234]
 709 >>> # Remove some:
 710 ... a[0:2] = []
 711 >>> a
 712 [123, 1234]
 713 >>> # Insert some:
 714 ... a[1:1] = ['bletch', 'xyzzy']
 715 >>> a
 716 [123, 'bletch', 'xyzzy', 1234]
 717 >>> a[:0] = a     # Insert (a copy of) itself at the beginning
 718 >>> a
 719 [123, 'bletch', 'xyzzy', 1234, 123, 'bletch', 'xyzzy', 1234]
 720 >>>
 721 \end{verbatim}\ecode
 722 %
 723 The built-in function {\tt len()} also applies to lists:
 724
 725 \bcode\begin{verbatim}
 726 >>> len(a)
 727 8
 728 >>>
 729 \end{verbatim}\ecode
 730 %
 731 It is possible to nest lists (create lists containing other lists),
 732 for example:
 733
 734 \bcode\begin{verbatim}
 735 >>> q = [2, 3]
 736 >>> p = [1, q, 4]
 737 >>> len(p)
 738 3
 739 >>> p[1]
 740 [2, 3]
 741 >>> p[1][0]
 742 2
 743 >>> p[1].append('xtra')     # See section 5.1
 744 >>> p
 745 [1, [2, 3, 'xtra'], 4]
 746 >>> q
 747 [2, 3, 'xtra']
 748 >>>
 749 \end{verbatim}\ecode
 750 %
 751 Note that in the last example, {\tt p[1]} and {\tt q} really refer to
 752 the same object!  We'll come back to {\em object semantics} later.
 753
 754 \section{First Steps Towards Programming}
 755
 756 Of course, we can use Python for more complicated tasks than adding
 757 two and two together.  For instance, we can write an initial
 758 subsequence of the {\em Fibonacci} series as follows:
 759
 760 \bcode\begin{verbatim}
 761 >>> # Fibonacci series:
 762 ... # the sum of two elements defines the next
 763 ... a, b = 0, 1
 764 >>> while b < 10:
 765 ...       print b
 766 ...       a, b = b, a+b
 767 ...
 768 1
 769 1
 770 2
 771 3
 772 5
 773 8
 774 >>>
 775 \end{verbatim}\ecode
 776 %
 777 This example introduces several new features.
 778
 779 \begin{itemize}
 780
 781 \item
 782 The first line contains a {\em multiple assignment}: the variables
 783 {\tt a} and {\tt b} simultaneously get the new values 0 and 1.  On the
 784 last line this is used again, demonstrating that the expressions on
 785 the right-hand side are all evaluated first before any of the
 786 assignments take place.
 787
 788 \item
 789 The {\tt while} loop executes as long as the condition (here: {\tt b <
 790 10}) remains true.  In Python, like in C, any non-zero integer value is
 791 true; zero is false.  The condition may also be a string or list value,
 792 in fact any sequence; anything with a non-zero length is true, empty
 793 sequences are false.  The test used in the example is a simple
 794 comparison.  The standard comparison operators are written the same as
 795 in C: {\tt <}, {\tt >}, {\tt ==}, {\tt <=}, {\tt >=} and {\tt !=}.
 796
 797 \item
 798 The {\em body} of the loop is {\em indented}: indentation is Python's
 799 way of grouping statements.  Python does not (yet!) provide an
 800 intelligent input line editing facility, so you have to type a tab or
 801 space(s) for each indented line.  In practice you will prepare more
 802 complicated input for Python with a text editor; most text editors have
 803 an auto-indent facility.  When a compound statement is entered
 804 interactively, it must be followed by a blank line to indicate
 805 completion (since the parser cannot guess when you have typed the last
 806 line).
 807
 808 \item
 809 The {\tt print} statement writes the value of the expression(s) it is
 810 given.  It differs from just writing the expression you want to write
 811 (as we did earlier in the calculator examples) in the way it handles
 812 multiple expressions and strings.  Strings are printed without quotes,
 813 and a space is inserted between items, so you can format things nicely,
 814 like this:
 815
 816 \bcode\begin{verbatim}
 817 >>> i = 256*256
 818 >>> print 'The value of i is', i
 819 The value of i is 65536
 820 >>>
 821 \end{verbatim}\ecode
 822 %
 823 A trailing comma avoids the newline after the output:
 824
 825 \bcode\begin{verbatim}
 826 >>> a, b = 0, 1
 827 >>> while b < 1000:
 828 ...     print b,
 829 ...     a, b = b, a+b
 830 ...
 831 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
 832 >>>
 833 \end{verbatim}\ecode
 834 %
 835 Note that the interpreter inserts a newline before it prints the next
 836 prompt if the last line was not completed.
 837
 838 \end{itemize}
 839
 840
 841 \chapter{More Control Flow Tools}
 842
 843 Besides the {\tt while} statement just introduced, Python knows the
 844 usual control flow statements known from other languages, with some
 845 twists.
 846
 847 \section{If Statements}
 848
 849 Perhaps the most well-known statement type is the {\tt if} statement.
 850 For example:
 851
 852 \bcode\begin{verbatim}
 853 >>> if x < 0:
 854 ...      x = 0
 855 ...      print 'Negative changed to zero'
 856 ... elif x == 0:
 857 ...      print 'Zero'
 858 ... elif x == 1:
 859 ...      print 'Single'
 860 ... else:
 861 ...      print 'More'
 862 ...
 863 \end{verbatim}\ecode
 864 %
 865 There can be zero or more {\tt elif} parts, and the {\tt else} part is
 866 optional.  The keyword `{\tt elif}' is short for `{\tt else if}', and is
 867 useful to avoid excessive indentation.  An {\tt if...elif...elif...}
 868 sequence is a substitute for the {\em switch} or {\em case} statements
 869 found in other languages.
 870
 871 \section{For Statements}
 872
 873 The {\tt for} statement in Python differs a bit from what you may be
 874 used to in C or Pascal.  Rather than always iterating over an
 875 arithmetic progression of numbers (like in Pascal), or leaving the user
 876 completely free in the iteration test and step (as C), Python's {\tt
 877 for} statement iterates over the items of any sequence (e.g., a list
 878 or a string), in the order that they appear in the sequence.  For
 879 example (no pun intended):
 880
 881 \bcode\begin{verbatim}
 882 >>> # Measure some strings:
 883 ... a = ['cat', 'window', 'defenestrate']
 884 >>> for x in a:
 885 ...     print x, len(x)
 886 ...
 887 cat 3
 888 window 6
 889 defenestrate 12
 890 >>>
 891 \end{verbatim}\ecode
 892 %
 893 It is not safe to modify the sequence being iterated over in the loop
 894 (this can only happen for mutable sequence types, i.e., lists).  If
 895 you need to modify the list you are iterating over, e.g., duplicate
 896 selected items, you must iterate over a copy.  The slice notation
 897 makes this particularly convenient:
 898
 899 \bcode\begin{verbatim}
 900 >>> for x in a[:]: # make a slice copy of the entire list
 901 ...    if len(x) > 6: a.insert(0, x)
 902 ...
 903 >>> a
 904 ['defenestrate', 'cat', 'window', 'defenestrate']
 905 >>>
 906 \end{verbatim}\ecode
 907
 908 \section{The {\tt range()} Function}
 909
 910 If you do need to iterate over a sequence of numbers, the built-in
 911 function {\tt range()} comes in handy.  It generates lists containing
 912 arithmetic progressions, e.g.:
 913
 914 \bcode\begin{verbatim}
 915 >>> range(10)
 916 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 917 >>>
 918 \end{verbatim}\ecode
 919 %
 920 The given end point is never part of the generated list; {\tt range(10)}
 921 generates a list of 10 values, exactly the legal indices for items of a
 922 sequence of length 10.  It is possible to let the range start at another
 923 number, or to specify a different increment (even negative):
 924
 925 \bcode\begin{verbatim}
 926 >>> range(5, 10)
 927 [5, 6, 7, 8, 9]
 928 >>> range(0, 10, 3)
 929 [0, 3, 6, 9]
 930 >>> range(-10, -100, -30)
 931 [-10, -40, -70]
 932 >>>
 933 \end{verbatim}\ecode
 934 %
 935 To iterate over the indices of a sequence, combine {\tt range()} and
 936 {\tt len()} as follows:
 937
 938 \bcode\begin{verbatim}
 939 >>> a = ['Mary', 'had', 'a', 'little', 'lamb']
 940 >>> for i in range(len(a)):
 941 ...     print i, a[i]
 942 ...
 943 0 Mary
 944 1 had
 945 2 a
 946 3 little
 947 4 lamb
 948 >>>
 949 \end{verbatim}\ecode
 950
 951 \section{Break and Continue Statements, and Else Clauses on Loops}
 952
 953 The {\tt break} statement, like in C, breaks out of the smallest
 954 enclosing {\tt for} or {\tt while} loop.
 955
 956 The {\tt continue} statement, also borrowed from C, continues with the
 957 next iteration of the loop.
 958
 959 Loop statements may have an {\tt else} clause; it is executed when the
 960 loop terminates through exhaustion of the list (with {\tt for}) or when
 961 the condition becomes false (with {\tt while}), but not when the loop is
 962 terminated by a {\tt break} statement.  This is exemplified by the
 963 following loop, which searches for prime numbers:
 964
 965 \bcode\begin{verbatim}
 966 >>> for n in range(2, 10):
 967 ...     for x in range(2, n):
 968 ...         if n % x == 0:
 969 ...            print n, 'equals', x, '*', n/x
 970 ...            break
 971 ...     else:
 972 ...          print n, 'is a prime number'
 973 ...
 974 2 is a prime number
 975 3 is a prime number
 976 4 equals 2 * 2
 977 5 is a prime number
 978 6 equals 2 * 3
 979 7 is a prime number
 980 8 equals 2 * 4
 981 9 equals 3 * 3
 982 >>>
 983 \end{verbatim}\ecode
 984
 985 \section{Pass Statements}
 986
 987 The {\tt pass} statement does nothing.
 988 It can be used when a statement is required syntactically but the
 989 program requires no action.
 990 For example:
 991
 992 \bcode\begin{verbatim}
 993 >>> while 1:
 994 ...       pass # Busy-wait for keyboard interrupt
 995 ...
 996 \end{verbatim}\ecode
 997
 998 \section{Defining Functions}
 999
1000 We can create a function that writes the Fibonacci series to an
1001 arbitrary boundary:
1002
1003 \bcode\begin{verbatim}
1004 >>> def fib(n):    # write Fibonacci series up to n
1005 ...     a, b = 0, 1
1006 ...     while b < n:
1007 ...           print b,
1008 ...           a, b = b, a+b
1009 ...
1010 >>> # Now call the function we just defined:
1011 ... fib(2000)
1012 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
1013 >>>
1014 \end{verbatim}\ecode
1015 %
1016 The keyword {\tt def} introduces a function {\em definition}.  It must
1017 be followed by the function name and the parenthesized list of formal
1018 parameters.  The statements that form the body of the function starts at
1019 the next line, indented by a tab stop.
1020
1021 The {\em execution} of a function introduces a new symbol table used
1022 for the local variables of the function.  More precisely, all variable
1023 assignments in a function store the value in the local symbol table;
1024 whereas
1025 variable references first look in the local symbol table, then
1026 in the global symbol table, and then in the table of built-in names.
1027 Thus,
1028 global variables cannot be directly assigned a value within a
1029 function (unless named in a {\tt global} statement), although
1030 they may be referenced.
1031
1032 The actual parameters (arguments) to a function call are introduced in
1033 the local symbol table of the called function when it is called; thus,
1034 arguments are passed using {\em call\ by\ value}.%
1035 \footnote{
1036          Actually, {\em call  by  object reference} would be a better
1037          description, since if a mutable object is passed, the caller
1038          will see any changes the callee makes to it (e.g., items
1039          inserted into a list).
1040 }
1041 When a function calls another function, a new local symbol table is
1042 created for that call.
1043
1044 A function definition introduces the function name in the
1045 current
1046 symbol table.  The value
1047 of the function name
1048 has a type that is recognized by the interpreter as a user-defined
1049 function.  This value can be assigned to another name which can then
1050 also be used as a function.  This serves as a general renaming
1051 mechanism:
1052
1053 \bcode\begin{verbatim}
1054 >>> fib
1055 <function object at 10042ed0>
1056 >>> f = fib
1057 >>> f(100)
1058 1 1 2 3 5 8 13 21 34 55 89
1059 >>>
1060 \end{verbatim}\ecode
1061 %
1062 You might object that {\tt fib} is not a function but a procedure.  In
1063 Python, like in C, procedures are just functions that don't return a
1064 value.  In fact, technically speaking, procedures do return a value,
1065 albeit a rather boring one.  This value is called {\tt None} (it's a
1066 built-in name).  Writing the value {\tt None} is normally suppressed by
1067 the interpreter if it would be the only value written.  You can see it
1068 if you really want to:
1069
1070 \bcode\begin{verbatim}
1071 >>> print fib(0)
1072 None
1073 >>>
1074 \end{verbatim}\ecode
1075 %
1076 It is simple to write a function that returns a list of the numbers of
1077 the Fibonacci series, instead of printing it:
1078
1079 \bcode\begin{verbatim}
1080 >>> def fib2(n): # return Fibonacci series up to n
1081 ...     result = []
1082 ...     a, b = 0, 1
1083 ...     while b < n:
1084 ...           result.append(b)    # see below
1085 ...           a, b = b, a+b
1086 ...     return result
1087 ...
1088 >>> f100 = fib2(100)    # call it
1089 >>> f100                # write the result
1090 [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
1091 >>>
1092 \end{verbatim}\ecode
1093 %
1094 This example, as usual, demonstrates some new Python features:
1095
1096 \begin{itemize}
1097
1098 \item
1099 The {\tt return} statement returns with a value from a function.  {\tt
1100 return} without an expression argument is used to return from the middle
1101 of a procedure (falling off the end also returns from a procedure), in
1102 which case the {\tt None} value is returned.
1103
1104 \item
1105 The statement {\tt result.append(b)} calls a {\em method} of the list
1106 object {\tt result}.  A method is a function that `belongs' to an
1107 object and is named {\tt obj.methodname}, where {\tt obj} is some
1108 object (this may be an expression), and {\tt methodname} is the name
1109 of a method that is defined by the object's type.  Different types
1110 define different methods.  Methods of different types may have the
1111 same name without causing ambiguity.  (It is possible to define your
1112 own object types and methods, using {\em classes}, as discussed later
1113 in this tutorial.)
1114 The method {\tt append} shown in the example, is defined for
1115 list objects; it adds a new element at the end of the list.  In this
1116 example
1117 it is equivalent to {\tt result = result + [b]}, but more efficient.
1118
1119 \end{itemize}
1120
1121
1122 \chapter{Odds and Ends}
1123
1124 This chapter describes some things you've learned about already in
1125 more detail, and adds some new things as well.
1126
1127 \section{More on Lists}
1128
1129 The list data type has some more methods.  Here are all of the methods
1130 of lists objects:
1131
1132 \begin{description}
1133
1134 \item[{\tt insert(i, x)}]
1135 Insert an item at a given position.  The first argument is the index of
1136 the element before which to insert, so {\tt a.insert(0, x)} inserts at
1137 the front of the list, and {\tt a.insert(len(a), x)} is equivalent to
1138 {\tt a.append(x)}.
1139
1140 \item[{\tt append(x)}]
1141 Equivalent to {\tt a.insert(len(a), x)}.
1142
1143 \item[{\tt index(x)}]
1144 Return the index in the list of the first item whose value is {\tt x}.
1145 It is an error if there is no such item.
1146
1147 \item[{\tt remove(x)}]
1148 Remove the first item from the list whose value is {\tt x}.
1149 It is an error if there is no such item.
1150
1151 \item[{\tt sort()}]
1152 Sort the items of the list, in place.
1153
1154 \item[{\tt reverse()}]
1155 Reverse the elements of the list, in place.
1156
1157 \item[{\tt count(x)}]
1158 Return the number of times {\tt x} appears in the list.
1159
1160 \end{description}
1161
1162 An example that uses all list methods:
1163
1164 \bcode\begin{verbatim}
1165 >>> a = [66.6, 333, 333, 1, 1234.5]
1166 >>> print a.count(333), a.count(66.6), a.count('x')
1167 2 1 0
1168 >>> a.insert(2, -1)
1169 >>> a.append(333)
1170 >>> a
1171 [66.6, 333, -1, 333, 1, 1234.5, 333]
1172 >>> a.index(333)
1173 1
1174 >>> a.remove(333)
1175 >>> a
1176 [66.6, -1, 333, 1, 1234.5, 333]
1177 >>> a.reverse()
1178 >>> a
1179 [333, 1234.5, 1, 333, -1, 66.6]
1180 >>> a.sort()
1181 >>> a
1182 [-1, 1, 66.6, 333, 333, 1234.5]
1183 >>>
1184 \end{verbatim}\ecode
1185
1186 \section{The {\tt del} statement}
1187
1188 There is a way to remove an item from a list given its index instead
1189 of its value: the {\tt del} statement.  This can also be used to
1190 remove slices from a list (which we did earlier by assignment of an
1191 empty list to the slice).  For example:
1192
1193 \bcode\begin{verbatim}
1194 >>> a
1195 [-1, 1, 66.6, 333, 333, 1234.5]
1196 >>> del a[0]
1197 >>> a
1198 [1, 66.6, 333, 333, 1234.5]
1199 >>> del a[2:4]
1200 >>> a
1201 [1, 66.6, 1234.5]
1202 >>>
1203 \end{verbatim}\ecode
1204 %
1205 {\tt del} can also be used to delete entire variables:
1206
1207 \bcode\begin{verbatim}
1208 >>> del a
1209 >>>
1210 \end{verbatim}\ecode
1211 %
1212 Referencing the name {\tt a} hereafter is an error (at least until
1213 another value is assigned to it).  We'll find other uses for {\tt del}
1214 later.
1215
1216 \section{Tuples and Sequences}
1217
1218 We saw that lists and strings have many common properties, e.g.,
1219 indexing and slicing operations.  They are two examples of {\em
1220 sequence} data types.  Since Python is an evolving language, other
1221 sequence data types may be added.  There is also another standard
1222 sequence data type: the {\em tuple}.
1223
1224 A tuple consists of a number of values separated by commas, for
1225 instance:
1226
1227 \bcode\begin{verbatim}
1228 >>> t = 12345, 54321, 'hello!'
1229 >>> t[0]
1230 12345
1231 >>> t
1232 (12345, 54321, 'hello!')
1233 >>> # Tuples may be nested:
1234 ... u = t, (1, 2, 3, 4, 5)
1235 >>> u
1236 ((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
1237 >>>
1238 \end{verbatim}\ecode
1239 %
1240 As you see, on output tuples are alway enclosed in parentheses, so
1241 that nested tuples are interpreted correctly; they may be input with
1242 or without surrounding parentheses, although often parentheses are
1243 necessary anyway (if the tuple is part of a larger expression).
1244
1245 Tuples have many uses, e.g., (x, y) coordinate pairs, employee records
1246 from a database, etc.  Tuples, like strings, are immutable: it is not
1247 possible to assign to the individual items of a tuple (you can
1248 simulate much of the same effect with slicing and concatenation,
1249 though).
1250
1251 A special problem is the construction of tuples containing 0 or 1
1252 items: the syntax has some extra quirks to accommodate these.  Empty
1253 tuples are constructed by an empty pair of parentheses; a tuple with
1254 one item is constructed by following a value with a comma
1255 (it is not sufficient to enclose a single value in parentheses).
1256 Ugly, but effective.  For example:
1257
1258 \bcode\begin{verbatim}
1259 >>> empty = ()
1260 >>> singleton = 'hello',    # <-- note trailing comma
1261 >>> len(empty)
1262 0
1263 >>> len(singleton)
1264 1
1265 >>> singleton
1266 ('hello',)
1267 >>>
1268 \end{verbatim}\ecode
1269 %
1270 The statement {\tt t = 12345, 54321, 'hello!'} is an example of {\em
1271 tuple packing}: the values {\tt 12345}, {\tt 54321} and {\tt 'hello!'}
1272 are packed together in a tuple.  The reverse operation is also
1273 possible, e.g.:
1274
1275 \bcode\begin{verbatim}
1276 >>> x, y, z = t
1277 >>>
1278 \end{verbatim}\ecode
1279 %
1280 This is called, appropriately enough, {\em tuple unpacking}.  Tuple
1281 unpacking requires that the list of variables on the left has the same
1282 number of elements as the length of the tuple.  Note that multiple
1283 assignment is really just a combination of tuple packing and tuple
1284 unpacking!
1285
1286 Occasionally, the corresponding operation on lists is useful: {\em list
1287 unpacking}.  This is supported by enclosing the list of variables in
1288 square brackets:
1289
1290 \bcode\begin{verbatim}
1291 >>> a = ['spam', 'eggs', 100, 1234]
1292 >>> [a1, a2, a3, a4] = a
1293 >>>
1294 \end{verbatim}\ecode
1295
1296 \section{Dictionaries}
1297
1298 Another useful data type built into Python is the {\em dictionary}.
1299 Dictionaries are sometimes found in other languages as ``associative
1300 memories'' or ``associative arrays''.  Unlike sequences, which are
1301 indexed by a range of numbers, dictionaries are indexed by {\em keys},
1302 which are strings (the use of non-string values as keys
1303 is supported, but beyond the scope of this tutorial).
1304 It is best to think of a dictionary as an unordered set of
1305 {\em key:value} pairs, with the requirement that the keys are unique
1306 (within one dictionary).
1307 A pair of braces creates an empty dictionary: \verb/{}/.
1308 Placing a comma-separated list of key:value pairs within the
1309 braces adds initial key:value pairs to the dictionary; this is also the
1310 way dictionaries are written on output.
1311
1312 The main operations on a dictionary are storing a value with some key
1313 and extracting the value given the key.  It is also possible to delete
1314 a key:value pair
1315 with {\tt del}.
1316 If you store using a key that is already in use, the old value
1317 associated with that key is forgotten.  It is an error to extract a
1318 value using a non-existent key.
1319
1320 The {\tt keys()} method of a dictionary object returns a list of all the
1321 keys used in the dictionary, in random order (if you want it sorted,
1322 just apply the {\tt sort()} method to the list of keys).  To check
1323 whether a single key is in the dictionary, use the \verb/has_key()/
1324 method of the dictionary.
1325
1326 Here is a small example using a dictionary:
1327
1328 \bcode\begin{verbatim}
1329 >>> tel = {'jack': 4098, 'sape': 4139}
1330 >>> tel['guido'] = 4127
1331 >>> tel
1332 {'sape': 4139, 'guido': 4127, 'jack': 4098}
1333 >>> tel['jack']
1334 4098
1335 >>> del tel['sape']
1336 >>> tel['irv'] = 4127
1337 >>> tel
1338 {'guido': 4127, 'irv': 4127, 'jack': 4098}
1339 >>> tel.keys()
1340 ['guido', 'irv', 'jack']
1341 >>> tel.has_key('guido')
1342 1
1343 >>>
1344 \end{verbatim}\ecode
1345
1346 \section{More on Conditions}
1347
1348 The conditions used in {\tt while} and {\tt if} statements above can
1349 contain other operators besides comparisons.
1350
1351 The comparison operators {\tt in} and {\tt not in} check whether a value
1352 occurs (does not occur) in a sequence.  The operators {\tt is} and {\tt
1353 is not} compare whether two objects are really the same object; this
1354 only matters for mutable objects like lists.  All comparison operators
1355 have the same priority, which is lower than that of all numerical
1356 operators.
1357
1358 Comparisons can be chained: e.g., {\tt a < b == c} tests whether {\tt a}
1359 is less than {\tt b} and moreover {\tt b} equals {\tt c}.
1360
1361 Comparisons may be combined by the Boolean operators {\tt and} and {\tt
1362 or}, and the outcome of a comparison (or of any other Boolean
1363 expression) may be negated with {\tt not}.  These all have lower
1364 priorities than comparison operators again; between them, {\tt not} has
1365 the highest priority, and {\tt or} the lowest, so that
1366 {\tt A and not B or C} is equivalent to {\tt (A and (not B)) or C}.  Of
1367 course, parentheses can be used to express the desired composition.
1368
1369 The Boolean operators {\tt and} and {\tt or} are so-called {\em
1370 shortcut} operators: their arguments are evaluated from left to right,
1371 and evaluation stops as soon as the outcome is determined.  E.g., if
1372 {\tt A} and {\tt C} are true but {\tt B} is false, {\tt A and B and C}
1373 does not evaluate the expression C.  In general, the return value of a
1374 shortcut operator, when used as a general value and not as a Boolean, is
1375 the last evaluated argument.
1376
1377 It is possible to assign the result of a comparison or other Boolean
1378 expression to a variable.  For example,
1379
1380 \bcode\begin{verbatim}
1381 >>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
1382 >>> non_null = string1 or string2 or string3
1383 >>> non_null
1384 'Trondheim'
1385 >>>
1386 \end{verbatim}\ecode
1387 %
1388 Note that in Python, unlike C, assignment cannot occur inside expressions.
1389
1390 \section{Comparing Sequences and Other Types}
1391
1392 Sequence objects may be compared to other objects with the same
1393 sequence type.  The comparison uses {\em lexicographical} ordering:
1394 first the first two items are compared, and if they differ this
1395 determines the outcome of the comparison; if they are equal, the next
1396 two items are compared, and so on, until either sequence is exhausted.
1397 If two items to be compared are themselves sequences of the same type,
1398 the lexicographical comparison is carried out recursively.  If all
1399 items of two sequences compare equal, the sequences are considered
1400 equal.  If one sequence is an initial subsequence of the other, the
1401 shorted sequence is the smaller one.  Lexicographical ordering for
1402 strings uses the \ASCII{} ordering for individual characters.  Some
1403 examples of comparisons between sequences with the same types:
1404
1405 \bcode\begin{verbatim}
1406 (1, 2, 3)              < (1, 2, 4)
1407 [1, 2, 3]              < [1, 2, 4]
1408 'ABC' < 'C' < 'Pascal' < 'Python'
1409 (1, 2, 3, 4)           < (1, 2, 4)
1410 (1, 2)                 < (1, 2, -1)
1411 (1, 2, 3)              = (1.0, 2.0, 3.0)
1412 (1, 2, ('aa', 'ab'))   < (1, 2, ('abc', 'a'), 4)
1413 \end{verbatim}\ecode
1414 %
1415 Note that comparing objects of different types is legal.  The outcome
1416 is deterministic but arbitrary: the types are ordered by their name.
1417 Thus, a list is always smaller than a string, a string is always
1418 smaller than a tuple, etc.  Mixed numeric types are compared according
1419 to their numeric value, so 0 equals 0.0, etc.%
1420 \footnote{
1421         The rules for comparing objects of different types should
1422         not be relied upon; they may change in a future version of
1423         the language.
1424 }
1425
1426
1427 \chapter{Modules}
1428
1429 If you quit from the Python interpreter and enter it again, the
1430 definitions you have made (functions and variables) are lost.
1431 Therefore, if you want to write a somewhat longer program, you are
1432 better off using a text editor to prepare the input for the interpreter
1433 and running it with that file as input instead.  This is known as creating a
1434 {\em script}.  As your program gets longer, you may want to split it
1435 into several files for easier maintenance.  You may also want to use a
1436 handy function that you've written in several programs without copying
1437 its definition into each program.
1438
1439 To support this, Python has a way to put definitions in a file and use
1440 them in a script or in an interactive instance of the interpreter.
1441 Such a file is called a {\em module}; definitions from a module can be
1442 {\em imported} into other modules or into the {\em main} module (the
1443 collection of variables that you have access to in a script
1444 executed at the top level
1445 and in calculator mode).
1446
1447 A module is a file containing Python definitions and statements.  The
1448 file name is the module name with the suffix {\tt .py} appended.  Within
1449 a module, the module's name (as a string) is available as the value of
1450 the global variable {\tt __name__}.  For instance, use your favorite text
1451 editor to create a file called {\tt fibo.py} in the current directory
1452 with the following contents:
1453
1454 \bcode\begin{verbatim}
1455 # Fibonacci numbers module
1456
1457 def fib(n):    # write Fibonacci series up to n
1458     a, b = 0, 1
1459     while b < n:
1460           print b,
1461           a, b = b, a+b
1462
1463 def fib2(n): # return Fibonacci series up to n
1464     result = []
1465     a, b = 0, 1
1466     while b < n:
1467           result.append(b)
1468           a, b = b, a+b
1469     return result
1470 \end{verbatim}\ecode
1471 %
1472 Now enter the Python interpreter and import this module with the
1473 following command:
1474
1475 \bcode\begin{verbatim}
1476 >>> import fibo
1477 >>>
1478 \end{verbatim}\ecode
1479 %
1480 This does not enter the names of the functions defined in
1481 {\tt fibo}
1482 directly in the current symbol table; it only enters the module name
1483 {\tt fibo}
1484 there.
1485 Using the module name you can access the functions:
1486
1487 \bcode\begin{verbatim}
1488 >>> fibo.fib(1000)
1489 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
1490 >>> fibo.fib2(100)
1491 [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
1492 >>> fibo.__name__
1493 'fibo'
1494 >>>
1495 \end{verbatim}\ecode
1496 %
1497 If you intend to use a function often you can assign it to a local name:
1498
1499 \bcode\begin{verbatim}
1500 >>> fib = fibo.fib
1501 >>> fib(500)
1502 1 1 2 3 5 8 13 21 34 55 89 144 233 377
1503 >>>
1504 \end{verbatim}\ecode
1505
1506 \section{More on Modules}
1507
1508 A module can contain executable statements as well as function
1509 definitions.
1510 These statements are intended to initialize the module.
1511 They are executed only the
1512 {\em first}
1513 time the module is imported somewhere.%
1514 \footnote{
1515         In fact function definitions are also `statements' that are
1516         `executed'; the execution enters the function name in the
1517         module's global symbol table.
1518 }
1519
1520 Each module has its own private symbol table, which is used as the
1521 global symbol table by all functions defined in the module.
1522 Thus, the author of a module can use global variables in the module
1523 without worrying about accidental clashes with a user's global
1524 variables.
1525 On the other hand, if you know what you are doing you can touch a
1526 module's global variables with the same notation used to refer to its
1527 functions,
1528 {\tt modname.itemname}.
1529
1530 Modules can import other modules.
1531 It is customary but not required to place all
1532 {\tt import}
1533 statements at the beginning of a module (or script, for that matter).
1534 The imported module names are placed in the importing module's global
1535 symbol table.
1536
1537 There is a variant of the
1538 {\tt import}
1539 statement that imports names from a module directly into the importing
1540 module's symbol table.
1541 For example:
1542
1543 \bcode\begin{verbatim}
1544 >>> from fibo import fib, fib2
1545 >>> fib(500)
1546 1 1 2 3 5 8 13 21 34 55 89 144 233 377
1547 >>>
1548 \end{verbatim}\ecode
1549 %
1550 This does not introduce the module name from which the imports are taken
1551 in the local symbol table (so in the example, {\tt fibo} is not
1552 defined).
1553
1554 There is even a variant to import all names that a module defines:
1555
1556 \bcode\begin{verbatim}
1557 >>> from fibo import *
1558 >>> fib(500)
1559 1 1 2 3 5 8 13 21 34 55 89 144 233 377
1560 >>>
1561 \end{verbatim}\ecode
1562 %
1563 This imports all names except those beginning with an underscore
1564 ({\tt _}).
1565
1566 \section{Standard Modules}
1567
1568 Python comes with a library of standard modules, described in a separate
1569 document (Python Library Reference).  Some modules are built into the
1570 interpreter; these provide access to operations that are not part of the
1571 core of the language but are nevertheless built in, either for
1572 efficiency or to provide access to operating system primitives such as
1573 system calls.  The set of such modules is a configuration option; e.g.,
1574 the {\tt amoeba} module is only provided on systems that somehow support
1575 Amoeba primitives.  One particular module deserves some attention: {\tt
1576 sys}, which is built into every Python interpreter.  The variables {\tt
1577 sys.ps1} and {\tt sys.ps2} define the strings used as primary and
1578 secondary prompts:
1579
1580 \bcode\begin{verbatim}
1581 >>> import sys
1582 >>> sys.ps1
1583 '>>> '
1584 >>> sys.ps2
1585 '... '
1586 >>> sys.ps1 = 'C> '
1587 C> print 'Yuck!'
1588 Yuck!
1589 C>
1590 \end{verbatim}\ecode
1591 %
1592 These two variables are only defined if the interpreter is in
1593 interactive mode.
1594
1595 The variable
1596 {\tt sys.path}
1597 is a list of strings that determine the interpreter's search path for
1598 modules.
1599 It is initialized to a default path taken from the environment variable
1600 {\tt PYTHONPATH},
1601 or from a built-in default if
1602 {\tt PYTHONPATH}
1603 is not set.
1604 You can modify it using standard list operations, e.g.:
1605
1606 \bcode\begin{verbatim}
1607 >>> import sys
1608 >>> sys.path.append('/ufs/guido/lib/python')
1609 >>>
1610 \end{verbatim}\ecode
1611
1612 \section{The {\tt dir()} function}
1613
1614 The built-in function {\tt dir} is used to find out which names a module
1615 defines.  It returns a sorted list of strings:
1616
1617 \bcode\begin{verbatim}
1618 >>> import fibo, sys
1619 >>> dir(fibo)
1620 ['__name__', 'fib', 'fib2']
1621 >>> dir(sys)
1622 ['__name__', 'argv', 'builtin_module_names', 'copyright', 'exit',
1623 'maxint', 'modules', 'path', 'ps1', 'ps2', 'setprofile', 'settrace',
1624 'stderr', 'stdin', 'stdout', 'version']
1625 >>>
1626 \end{verbatim}\ecode
1627 %
1628 Without arguments, {\tt dir()} lists the names you have defined currently:
1629
1630 \bcode\begin{verbatim}
1631 >>> a = [1, 2, 3, 4, 5]
1632 >>> import fibo, sys
1633 >>> fib = fibo.fib
1634 >>> dir()
1635 ['__name__', 'a', 'fib', 'fibo', 'sys']
1636 >>>
1637 \end{verbatim}\ecode
1638 %
1639 Note that it lists all types of names: variables, modules, functions, etc.
1640
1641 {\tt dir()} does not list the names of built-in functions and variables.
1642 If you want a list of those, they are defined in the standard module
1643 {\tt __builtin__}:
1644
1645 \bcode\begin{verbatim}
1646 >>> import __builtin__
1647 >>> dir(__builtin__)
1648 ['AccessError', 'AttributeError', 'ConflictError', 'EOFError', 'IOError',
1649 'ImportError', 'IndexError', 'KeyError', 'KeyboardInterrupt',
1650 'MemoryError', 'NameError', 'None', 'OverflowError', 'RuntimeError',
1651 'SyntaxError', 'SystemError', 'SystemExit', 'TypeError', 'ValueError',
1652 'ZeroDivisionError', '__name__', 'abs', 'apply', 'chr', 'cmp', 'coerce',
1653 'compile', 'dir', 'divmod', 'eval', 'execfile', 'filter', 'float',
1654 'getattr', 'hasattr', 'hash', 'hex', 'id', 'input', 'int', 'len', 'long',
1655 'map', 'max', 'min', 'oct', 'open', 'ord', 'pow', 'range', 'raw_input',
1656 'reduce', 'reload', 'repr', 'round', 'setattr', 'str', 'type', 'xrange']
1657 >>>
1658 \end{verbatim}\ecode
1659
1660
1661 \chapter{Output Formatting}
1662
1663 So far we've encountered two ways of writing values: {\em expression
1664 statements} and the {\tt print} statement.  (A third way is using the
1665 {\tt write} method of file objects; the standard output file can be
1666 referenced as {\tt sys.stdout}.  See the Library Reference for more
1667 information on this.)
1668
1669 Often you'll want more control over the formatting of your output than
1670 simply printing space-separated values.  The key to nice formatting in
1671 Python is to do all the string handling yourself; using string slicing
1672 and concatenation operations you can create any lay-out you can imagine.
1673 The standard module {\tt string} contains some useful operations for
1674 padding strings to a given column width; these will be discussed shortly.
1675 Finally, the \code{\%} operator (modulo) with a string left argument
1676 interprets this string as a C sprintf format string to be applied to the
1677 right argument, and returns the string resulting from this formatting
1678 operation.
1679
1680 One question remains, of course: how do you convert values to strings?
1681 Luckily, Python has a way to convert any value to a string: just write
1682 the value between reverse quotes (\verb/``/).  Some examples:
1683
1684 \bcode\begin{verbatim}
1685 >>> x = 10 * 3.14
1686 >>> y = 200*200
1687 >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
1688 >>> print s
1689 The value of x is 31.4, and y is 40000...
1690 >>> # Reverse quotes work on other types besides numbers:
1691 ... p = [x, y]
1692 >>> ps = `p`
1693 >>> ps
1694 '[31.4, 40000]'
1695 >>> # Converting a string adds string quotes and backslashes:
1696 ... hello = 'hello, world\n'
1697 >>> hellos = `hello`
1698 >>> print hellos
1699 'hello, world\012'
1700 >>> # The argument of reverse quotes may be a tuple:
1701 ... `x, y, ('spam', 'eggs')`
1702 "(31.4, 40000, ('spam', 'eggs'))"
1703 >>>
1704 \end{verbatim}\ecode
1705 %
1706 Here are two ways to write a table of squares and cubes:
1707
1708 \bcode\begin{verbatim}
1709 >>> import string
1710 >>> for x in range(1, 11):
1711 ...     print string.rjust(`x`, 2), string.rjust(`x*x`, 3),
1712 ...     # Note trailing comma on previous line
1713 ...     print string.rjust(`x*x*x`, 4)
1714 ...
1715  1   1    1
1716  2   4    8
1717  3   9   27
1718  4  16   64
1719  5  25  125
1720  6  36  216
1721  7  49  343
1722  8  64  512
1723  9  81  729
1724 10 100 1000
1725 >>> for x in range(1,11):
1726 ...     print '%2d %3d %4d' % (x, x*x, x*x*x)
1727 ...
1728  1   1    1
1729  2   4    8
1730  3   9   27
1731  4  16   64
1732  5  25  125
1733  6  36  216
1734  7  49  343
1735  8  64  512
1736  9  81  729
1737 10 100 1000
1738 >>>
1739 \end{verbatim}\ecode
1740 %
1741 (Note that one space between each column was added by the way {\tt print}
1742 works: it always adds spaces between its arguments.)
1743
1744 This example demonstrates the function {\tt string.rjust()}, which
1745 right-justifies a string in a field of a given width by padding it with
1746 spaces on the left.  There are similar functions {\tt string.ljust()}
1747 and {\tt string.center()}.  These functions do not write anything, they
1748 just return a new string.  If the input string is too long, they don't
1749 truncate it, but return it unchanged; this will mess up your column
1750 lay-out but that's usually better than the alternative, which would be
1751 lying about a value.  (If you really want truncation you can always add
1752 a slice operation, as in {\tt string.ljust(x,~n)[0:n]}.)
1753
1754 There is another function, {\tt string.zfill}, which pads a numeric
1755 string on the left with zeros.  It understands about plus and minus
1756 signs:
1757
1758 \bcode\begin{verbatim}
1759 >>> string.zfill('12', 5)
1760 '00012'
1761 >>> string.zfill('-3.14', 7)
1762 '-003.14'
1763 >>> string.zfill('3.14159265359', 5)
1764 '3.14159265359'
1765 >>>
1766 \end{verbatim}\ecode
1767
1768
1769 \chapter{Errors and Exceptions}
1770
1771 Until now error messages haven't been more than mentioned, but if you
1772 have tried out the examples you have probably seen some.  There are
1773 (at least) two distinguishable kinds of errors: {\em syntax\ errors}
1774 and {\em exceptions}.
1775
1776 \section{Syntax Errors}
1777
1778 Syntax errors, also known as parsing errors, are perhaps the most common
1779 kind of complaint you get while you are still learning Python:
1780
1781 \bcode\begin{verbatim}
1782 >>> while 1 print 'Hello world'
1783   File "<stdin>", line 1
1784     while 1 print 'Hello world'
1785                 ^
1786 SyntaxError: invalid syntax
1787 >>>
1788 \end{verbatim}\ecode
1789 %
1790 The parser repeats the offending line and displays a little `arrow'
1791 pointing at the earliest point in the line where the error was detected.
1792 The error is caused by (or at least detected at) the token
1793 {\em preceding}
1794 the arrow: in the example, the error is detected at the keyword
1795 {\tt print}, since a colon ({\tt :}) is missing before it.
1796 File name and line number are printed so you know where to look in case
1797 the input came from a script.
1798
1799 \section{Exceptions}
1800
1801 Even if a statement or expression is syntactically correct, it may
1802 cause an error when an attempt is made to execute it.
1803 Errors detected during execution are called {\em exceptions} and are
1804 not unconditionally fatal: you will soon learn how to handle them in
1805 Python programs.  Most exceptions are not handled by programs,
1806 however, and result in error messages as shown here:
1807
1808 \bcode\small\begin{verbatim}
1809 >>> 10 * (1/0)
1810 Traceback (innermost last):
1811   File "<stdin>", line 1
1812 ZeroDivisionError: integer division or modulo
1813 >>> 4 + spam*3
1814 Traceback (innermost last):
1815   File "<stdin>", line 1
1816 NameError: spam
1817 >>> '2' + 2
1818 Traceback (innermost last):
1819   File "<stdin>", line 1
1820 TypeError: illegal argument type for built-in operation
1821 >>>
1822 \end{verbatim}\ecode
1823 %
1824 The last line of the error message indicates what happened.
1825 Exceptions come in different types, and the type is printed as part of
1826 the message: the types in the example are
1827 {\tt ZeroDivisionError},
1828 {\tt NameError}
1829 and
1830 {\tt TypeError}.
1831 The string printed as the exception type is the name of the built-in
1832 name for the exception that occurred.  This is true for all built-in
1833 exceptions, but need not be true for user-defined exceptions (although
1834 it is a useful convention).
1835 Standard exception names are built-in identifiers (not reserved
1836 keywords).
1837
1838 The rest of the line is a detail whose interpretation depends on the
1839 exception type; its meaning is dependent on the exception type.
1840
1841 The preceding part of the error message shows the context where the
1842 exception happened, in the form of a stack backtrace.
1843 In general it contains a stack backtrace listing source lines; however,
1844 it will not display lines read from standard input.
1845
1846 The Python library reference manual lists the built-in exceptions and
1847 their meanings.
1848
1849 \section{Handling Exceptions}
1850
1851 It is possible to write programs that handle selected exceptions.
1852 Look at the following example, which prints a table of inverses of
1853 some floating point numbers:
1854
1855 \bcode\begin{verbatim}
1856 >>> numbers = [0.3333, 2.5, 0, 10]
1857 >>> for x in numbers:
1858 ...     print x,
1859 ...     try:
1860 ...         print 1.0 / x
1861 ...     except ZeroDivisionError:
1862 ...         print '*** has no inverse ***'
1863 ...
1864 0.3333 3.00030003
1865 2.5 0.4
1866 0 *** has no inverse ***
1867 10 0.1
1868 >>>
1869 \end{verbatim}\ecode
1870 %
1871 The {\tt try} statement works as follows.
1872 \begin{itemize}
1873 \item
1874 First, the
1875 {\em try\ clause}
1876 (the statement(s) between the {\tt try} and {\tt except} keywords) is
1877 executed.
1878 \item
1879 If no exception occurs, the
1880 {\em except\ clause}
1881 is skipped and execution of the {\tt try} statement is finished.
1882 \item
1883 If an exception occurs during execution of the try clause,
1884 the rest of the clause is skipped.  Then if
1885 its type matches the exception named after the {\tt except} keyword,
1886 the rest of the try clause is skipped, the except clause is executed,
1887 and then execution continues after the {\tt try} statement.
1888 \item
1889 If an exception occurs which does not match the exception named in the
1890 except clause, it is passed on to outer try statements; if no handler is
1891 found, it is an
1892 {\em unhandled\ exception}
1893 and execution stops with a message as shown above.
1894 \end{itemize}
1895 A {\tt try} statement may have more than one except clause, to specify
1896 handlers for different exceptions.
1897 At most one handler will be executed.
1898 Handlers only handle exceptions that occur in the corresponding try
1899 clause, not in other handlers of the same {\tt try} statement.
1900 An except clause may name multiple exceptions as a parenthesized list,
1901 e.g.:
1902
1903 \bcode\begin{verbatim}
1904 ... except (RuntimeError, TypeError, NameError):
1905 ...     pass
1906 \end{verbatim}\ecode
1907 %
1908 The last except clause may omit the exception name(s), to serve as a
1909 wildcard.
1910 Use this with extreme caution, since it is easy to mask a real
1911 programming error in this way!
1912
1913 When an exception occurs, it may have an associated value, also known as
1914 the exceptions's
1915 {\em argument}.
1916 The presence and type of the argument depend on the exception type.
1917 For exception types which have an argument, the except clause may
1918 specify a variable after the exception name (or list) to receive the
1919 argument's value, as follows:
1920
1921 \bcode\begin{verbatim}
1922 >>> try:
1923 ...     spam()
1924 ... except NameError, x:
1925 ...     print 'name', x, 'undefined'
1926 ...
1927 name spam undefined
1928 >>>
1929 \end{verbatim}\ecode
1930 %
1931 If an exception has an argument, it is printed as the last part
1932 (`detail') of the message for unhandled exceptions.
1933
1934 Exception handlers don't just handle exceptions if they occur
1935 immediately in the try clause, but also if they occur inside functions
1936 that are called (even indirectly) in the try clause.
1937 For example:
1938
1939 \bcode\begin{verbatim}
1940 >>> def this_fails():
1941 ...     x = 1/0
1942 ...
1943 >>> try:
1944 ...     this_fails()
1945 ... except ZeroDivisionError, detail:
1946 ...     print 'Handling run-time error:', detail
1947 ...
1948 Handling run-time error: integer division or modulo
1949 >>>
1950 \end{verbatim}\ecode
1951
1952 \section{Raising Exceptions}
1953
1954 The {\tt raise} statement allows the programmer to force a specified
1955 exception to occur.
1956 For example:
1957
1958 \bcode\begin{verbatim}
1959 >>> raise NameError, 'HiThere'
1960 Traceback (innermost last):
1961   File "<stdin>", line 1
1962 NameError: HiThere
1963 >>>
1964 \end{verbatim}\ecode
1965 %
1966 The first argument to {\tt raise} names the exception to be raised.
1967 The optional second argument specifies the exception's argument.
1968
1969 \section{User-defined Exceptions}
1970
1971 Programs may name their own exceptions by assigning a string to a
1972 variable.
1973 For example:
1974
1975 \bcode\begin{verbatim}
1976 >>> my_exc = 'my_exc'
1977 >>> try:
1978 ...     raise my_exc, 2*2
1979 ... except my_exc, val:
1980 ...     print 'My exception occurred, value:', val
1981 ...
1982 My exception occurred, value: 4
1983 >>> raise my_exc, 1
1984 Traceback (innermost last):
1985   File "<stdin>", line 1
1986 my_exc: 1
1987 >>>
1988 \end{verbatim}\ecode
1989 %
1990 Many standard modules use this to report errors that may occur in
1991 functions they define.
1992
1993 \section{Defining Clean-up Actions}
1994
1995 The {\tt try} statement has another optional clause which is intended to
1996 define clean-up actions that must be executed under all circumstances.
1997 For example:
1998
1999 \bcode\begin{verbatim}
2000 >>> try:
2001 ...     raise KeyboardInterrupt
2002 ... finally:
2003 ...     print 'Goodbye, world!'
2004 ...
2005 Goodbye, world!
2006 Traceback (innermost last):
2007   File "<stdin>", line 2
2008 KeyboardInterrupt
2009 >>>
2010 \end{verbatim}\ecode
2011 %
2012 A {\tt finally} clause is executed whether or not an exception has
2013 occurred in the {\tt try} clause.  When an exception has occurred, it
2014 is re-raised after the {\tt finally} clause is executed.  The
2015 {\tt finally} clause is also executed ``on the way out'' when the
2016 {\tt try} statement is left via a {\tt break} or {\tt return}
2017 statement.
2018
2019 A {\tt try} statement must either have one or more {\tt except}
2020 clauses or one {\tt finally} clause, but not both.
2021
2022
2023 \chapter{Classes}
2024
2025 Python's class mechanism adds classes to the language with a minimum
2026 of new syntax and semantics.  It is a mixture of the class mechanisms
2027 found in \Cpp{} and Modula-3.  As is true for modules, classes in Python
2028 do not put an absolute barrier between definition and user, but rather
2029 rely on the politeness of the user not to ``break into the
2030 definition.''  The most important features of classes are retained
2031 with full power, however: the class inheritance mechanism allows
2032 multiple base classes, a derived class can override any methods of its
2033 base class(es), a method can call the method of a base class with the
2034 same name.  Objects can contain an arbitrary amount of private data.
2035
2036 In \Cpp{} terminology, all class members (including the data members) are
2037 {\em public}, and all member functions are {\em virtual}.  There are
2038 no special constructors or destructors.  As in Modula-3, there are no
2039 shorthands for referencing the object's members from its methods: the
2040 method function is declared with an explicit first argument
2041 representing the object, which is provided implicitly by the call.  As
2042 in Smalltalk, classes themselves are objects, albeit in the wider
2043 sense of the word: in Python, all data types are objects.  This
2044 provides semantics for importing and renaming.  But, just like in \Cpp{}
2045 or Modula-3, built-in types cannot be used as base classes for
2046 extension by the user.  Also, like in \Cpp{} but unlike in Modula-3, most
2047 built-in operators with special syntax (arithmetic operators,
2048 subscripting etc.) can be redefined for class members.
2049
2050
2051 \section{A word about terminology}
2052
2053 Lacking universally accepted terminology to talk about classes, I'll
2054 make occasional use of Smalltalk and \Cpp{} terms.  (I'd use Modula-3
2055 terms, since its object-oriented semantics are closer to those of
2056 Python than \Cpp{}, but I expect that few readers have heard of it...)
2057
2058 I also have to warn you that there's a terminological pitfall for
2059 object-oriented readers: the word ``object'' in Python does not
2060 necessarily mean a class instance.  Like \Cpp{} and Modula-3, and unlike
2061 Smalltalk, not all types in Python are classes: the basic built-in
2062 types like integers and lists aren't, and even somewhat more exotic
2063 types like files aren't.  However, {\em all} Python types share a little
2064 bit of common semantics that is best described by using the word
2065 object.
2066
2067 Objects have individuality, and multiple names (in multiple scopes)
2068 can be bound to the same object.  This is known as aliasing in other
2069 languages.  This is usually not appreciated on a first glance at
2070 Python, and can be safely ignored when dealing with immutable basic
2071 types (numbers, strings, tuples).  However, aliasing has an
2072 (intended!) effect on the semantics of Python code involving mutable
2073 objects such as lists, dictionaries, and most types representing
2074 entities outside the program (files, windows, etc.).  This is usually
2075 used to the benefit of the program, since aliases behave like pointers
2076 in some respects.  For example, passing an object is cheap since only
2077 a pointer is passed by the implementation; and if a function modifies
2078 an object passed as an argument, the caller will see the change --- this
2079 obviates the need for two different argument passing mechanisms as in
2080 Pascal.
2081
2082
2083 \section{Python scopes and name spaces}
2084
2085 Before introducing classes, I first have to tell you something about
2086 Python's scope rules.  Class definitions play some neat tricks with
2087 name spaces, and you need to know how scopes and name spaces work to
2088 fully understand what's going on.  Incidentally, knowledge about this
2089 subject is useful for any advanced Python programmer.
2090
2091 Let's begin with some definitions.
2092
2093 A {\em name space} is a mapping from names to objects.  Most name
2094 spaces are currently implemented as Python dictionaries, but that's
2095 normally not noticeable in any way (except for performance), and it
2096 may change in the future.  Examples of name spaces are: the set of
2097 built-in names (functions such as \verb\abs()\, and built-in exception
2098 names); the global names in a module; and the local names in a
2099 function invocation.  In a sense the set of attributes of an object
2100 also form a name space.  The important thing to know about name
2101 spaces is that there is absolutely no relation between names in
2102 different name spaces; for instance, two different modules may both
2103 define a function ``maximize'' without confusion --- users of the
2104 modules must prefix it with the module name.
2105
2106 By the way, I use the word {\em attribute} for any name following a
2107 dot --- for example, in the expression \verb\z.real\, \verb\real\ is
2108 an attribute of the object \verb\z\.  Strictly speaking, references to
2109 names in modules are attribute references: in the expression
2110 \verb\modname.funcname\, \verb\modname\ is a module object and
2111 \verb\funcname\ is an attribute of it.  In this case there happens to
2112 be a straightforward mapping between the module's attributes and the
2113 global names defined in the module: they share the same name space!%
2114 \footnote{
2115         Except for one thing.  Module objects have a secret read-only
2116         attribute called {\tt __dict__} which returns the dictionary
2117         used to implement the module's name space; the name
2118         {\tt __dict__} is an attribute but not a global name.
2119         Obviously, using this violates the abstraction of name space
2120         implementation, and should be restricted to things like
2121         post-mortem debuggers...
2122 }
2123
2124 Attributes may be read-only or writable.  In the latter case,
2125 assignment to attributes is possible.  Module attributes are writable:
2126 you can write \verb\modname.the_answer = 42\.  Writable attributes may
2127 also be deleted with the del statement, e.g.
2128 \verb\del modname.the_answer\.
2129
2130 Name spaces are created at different moments and have different
2131 lifetimes.  The name space containing the built-in names is created
2132 when the Python interpreter starts up, and is never deleted.  The
2133 global name space for a module is created when the module definition
2134 is read in; normally, module name spaces also last until the
2135 interpreter quits.  The statements executed by the top-level
2136 invocation of the interpreter, either read from a script file or
2137 interactively, are considered part of a module called \verb\__main__\,
2138 so they have their own global name space.  (The built-in names
2139 actually also live in a module; this is called \verb\__builtin__\.)
2140
2141 The local name space for a function is created when the function is
2142 called, and deleted when the function returns or raises an exception
2143 that is not handled within the function.  (Actually, forgetting would
2144 be a better way to describe what actually happens.)  Of course,
2145 recursive invocations each have their own local name space.
2146
2147 A {\em scope} is a textual region of a Python program where a name space
2148 is directly accessible.  ``Directly accessible'' here means that an
2149 unqualified reference to a name attempts to find the name in the name
2150 space.
2151
2152 Although scopes are determined statically, they are used dynamically.
2153 At any time during execution, exactly three nested scopes are in use
2154 (i.e., exactly three name spaces are directly accessible): the
2155 innermost scope, which is searched first, contains the local names,
2156 the middle scope, searched next, contains the current module's global
2157 names, and the outermost scope (searched last) is the name space
2158 containing built-in names.
2159
2160 Usually, the local scope references the local names of the (textually)
2161 current function.  Outside of functions, the local scope references
2162 the same name space as the global scope: the module's name space.
2163 Class definitions place yet another name space in the local scope.
2164
2165 It is important to realize that scopes are determined textually: the
2166 global scope of a function defined in a module is that module's name
2167 space, no matter from where or by what alias the function is called.
2168 On the other hand, the actual search for names is done dynamically, at
2169 run time --- however, the language definition is evolving towards
2170 static name resolution, at ``compile'' time, so don't rely on dynamic
2171 name resolution!  (In fact, local variables are already determined
2172 statically.)
2173
2174 A special quirk of Python is that assignments always go into the
2175 innermost scope.  Assignments do not copy data --- they just
2176 bind names to objects.  The same is true for deletions: the statement
2177 \verb\del x\ removes the binding of x from the name space referenced by the
2178 local scope.  In fact, all operations that introduce new names use the
2179 local scope: in particular, import statements and function definitions
2180 bind the module or function name in the local scope.  (The
2181 \verb\global\ statement can be used to indicate that particular
2182 variables live in the global scope.)
2183
2184
2185 \section{A first look at classes}
2186
2187 Classes introduce a little bit of new syntax, three new object types,
2188 and some new semantics.
2189
2190
2191 \subsection{Class definition syntax}
2192
2193 The simplest form of class definition looks like this:
2194
2195 \begin{verbatim}
2196         class ClassName:
2197                 <statement-1>
2198                 .
2199                 .
2200                 .
2201                 <statement-N>
2202 \end{verbatim}
2203
2204 Class definitions, like function definitions (\verb\def\ statements)
2205 must be executed before they have any effect.  (You could conceivably
2206 place a class definition in a branch of an \verb\if\ statement, or
2207 inside a function.)
2208
2209 In practice, the statements inside a class definition will usually be
2210 function definitions, but other statements are allowed, and sometimes
2211 useful --- we'll come back to this later.  The function definitions
2212 inside a class normally have a peculiar form of argument list,
2213 dictated by the calling conventions for methods --- again, this is
2214 explained later.
2215
2216 When a class definition is entered, a new name space is created, and
2217 used as the local scope --- thus, all assignments to local variables
2218 go into this new name space.  In particular, function definitions bind
2219 the name of the new function here.
2220
2221 When a class definition is left normally (via the end), a {\em class
2222 object} is created.  This is basically a wrapper around the contents
2223 of the name space created by the class definition; we'll learn more
2224 about class objects in the next section.  The original local scope
2225 (the one in effect just before the class definitions was entered) is
2226 reinstated, and the class object is bound here to class name given in
2227 the class definition header (ClassName in the example).
2228
2229
2230 \subsection{Class objects}
2231
2232 Class objects support two kinds of operations: attribute references
2233 and instantiation.
2234
2235 {\em Attribute references} use the standard syntax used for all
2236 attribute references in Python: \verb\obj.name\.  Valid attribute
2237 names are all the names that were in the class's name space when the
2238 class object was created.  So, if the class definition looked like
2239 this:
2240
2241 \begin{verbatim}
2242         class MyClass:
2243                 i = 12345
2244                 def f(x):
2245                         return 'hello world'
2246 \end{verbatim}
2247
2248 then \verb\MyClass.i\ and \verb\MyClass.f\ are valid attribute
2249 references, returning an integer and a function object, respectively.
2250 Class attributes can also be assigned to, so you can change the
2251 value of \verb\MyClass.i\ by assignment.
2252
2253 Class {\em instantiation} uses function notation.  Just pretend that
2254 the class object is a parameterless function that returns a new
2255 instance of the class.  For example, (assuming the above class):
2256
2257 \begin{verbatim}
2258         x = MyClass()
2259 \end{verbatim}
2260
2261 creates a new {\em instance} of the class and assigns this object to
2262 the local variable \verb\x\.
2263
2264
2265 \subsection{Instance objects}
2266
2267 Now what can we do with instance objects?  The only operations
2268 understood by instance objects are attribute references.  There are
2269 two kinds of valid attribute names.
2270
2271 The first I'll call {\em data attributes}.  These correspond to
2272 ``instance variables'' in Smalltalk, and to ``data members'' in \Cpp{}.
2273 Data attributes need not be declared; like local variables, they
2274 spring into existence when they are first assigned to.  For example,
2275 if \verb\x\ in the instance of \verb\MyClass\ created above, the
2276 following piece of code will print the value 16, without leaving a
2277 trace:
2278
2279 \begin{verbatim}
2280         x.counter = 1
2281         while x.counter < 10:
2282                 x.counter = x.counter * 2
2283         print x.counter
2284         del x.counter
2285 \end{verbatim}
2286
2287 The second kind of attribute references understood by instance objects
2288 are {\em methods}.  A method is a function that ``belongs to'' an
2289 object.  (In Python, the term method is not unique to class instances:
2290 other object types can have methods as well, e.g., list objects have
2291 methods called append, insert, remove, sort, and so on.  However,
2292 below, we'll use the term method exclusively to mean methods of class
2293 instance objects, unless explicitly stated otherwise.)
2294
2295 Valid method names of an instance object depend on its class.  By
2296 definition, all attributes of a class that are (user-defined) function
2297 objects define corresponding methods of its instances.  So in our
2298 example, \verb\x.f\ is a valid method reference, since
2299 \verb\MyClass.f\ is a function, but \verb\x.i\ is not, since
2300 \verb\MyClass.i\ is not.  But \verb\x.f\ is not the
2301 same thing as \verb\MyClass.f\ --- it is a {\em method object}, not a
2302 function object.
2303
2304
2305 \subsection{Method objects}
2306
2307 Usually, a method is called immediately, e.g.:
2308
2309 \begin{verbatim}
2310         x.f()
2311 \end{verbatim}
2312
2313 In our example, this will return the string \verb\'hello world'\.
2314 However, it is not necessary to call a method right away: \verb\x.f\
2315 is a method object, and can be stored away and called at a later
2316 moment, for example:
2317
2318 \begin{verbatim}
2319         xf = x.f
2320         while 1:
2321                 print xf()
2322 \end{verbatim}
2323
2324 will continue to print \verb\hello world\ until the end of time.
2325
2326 What exactly happens when a method is called?  You may have noticed
2327 that \verb\x.f()\ was called without an argument above, even though
2328 the function definition for \verb\f\ specified an argument.  What
2329 happened to the argument?  Surely Python raises an exception when a
2330 function that requires an argument is called without any --- even if
2331 the argument isn't actually used...
2332
2333 Actually, you may have guessed the answer: the special thing about
2334 methods is that the object is passed as the first argument of the
2335 function.  In our example, the call \verb\x.f()\ is exactly equivalent
2336 to \verb\MyClass.f(x)\.  In general, calling a method with a list of
2337 {\em n} arguments is equivalent to calling the corresponding function
2338 with an argument list that is created by inserting the method's object
2339 before the first argument.
2340
2341 If you still don't understand how methods work, a look at the
2342 implementation can perhaps clarify matters.  When an instance
2343 attribute is referenced that isn't a data attribute, its class is
2344 searched.  If the name denotes a valid class attribute that is a
2345 function object, a method object is created by packing (pointers to)
2346 the instance object and the function object just found together in an
2347 abstract object: this is the method object.  When the method object is
2348 called with an argument list, it is unpacked again, a new argument
2349 list is constructed from the instance object and the original argument
2350 list, and the function object is called with this new argument list.
2351
2352
2353 \section{Random remarks}
2354
2355
2356 [These should perhaps be placed more carefully...]
2357
2358
2359 Data attributes override method attributes with the same name; to
2360 avoid accidental name conflicts, which may cause hard-to-find bugs in
2361 large programs, it is wise to use some kind of convention that
2362 minimizes the chance of conflicts, e.g., capitalize method names,
2363 prefix data attribute names with a small unique string (perhaps just
2364 an underscore), or use verbs for methods and nouns for data attributes.
2365
2366
2367 Data attributes may be referenced by methods as well as by ordinary
2368 users (``clients'') of an object.  In other words, classes are not
2369 usable to implement pure abstract data types.  In fact, nothing in
2370 Python makes it possible to enforce data hiding --- it is all based
2371 upon convention.  (On the other hand, the Python implementation,
2372 written in C, can completely hide implementation details and control
2373 access to an object if necessary; this can be used by extensions to
2374 Python written in C.)
2375
2376
2377 Clients should use data attributes with care --- clients may mess up
2378 invariants maintained by the methods by stamping on their data
2379 attributes.  Note that clients may add data attributes of their own to
2380 an instance object without affecting the validity of the methods, as
2381 long as name conflicts are avoided --- again, a naming convention can
2382 save a lot of headaches here.
2383
2384
2385 There is no shorthand for referencing data attributes (or other
2386 methods!) from within methods.  I find that this actually increases
2387 the readability of methods: there is no chance of confusing local
2388 variables and instance variables when glancing through a method.
2389
2390
2391 Conventionally, the first argument of methods is often called
2392 \verb\self\.  This is nothing more than a convention: the name
2393 \verb\self\ has absolutely no special meaning to Python.  (Note,
2394 however, that by not following the convention your code may be less
2395 readable by other Python programmers, and it is also conceivable that
2396 a {\em class browser} program be written which relies upon such a
2397 convention.)
2398
2399
2400 Any function object that is a class attribute defines a method for
2401 instances of that class.  It is not necessary that the function
2402 definition is textually enclosed in the class definition: assigning a
2403 function object to a local variable in the class is also ok.  For
2404 example:
2405
2406 \begin{verbatim}
2407         # Function defined outside the class
2408         def f1(self, x, y):
2409                 return min(x, x+y)
2410
2411         class C:
2412                 f = f1
2413                 def g(self):
2414                         return 'hello world'
2415                 h = g
2416 \end{verbatim}
2417
2418 Now \verb\f\, \verb\g\ and \verb\h\ are all attributes of class
2419 \verb\C\ that refer to function objects, and consequently they are all
2420 methods of instances of \verb\C\ --- \verb\h\ being exactly equivalent
2421 to \verb\g\.  Note that this practice usually only serves to confuse
2422 the reader of a program.
2423
2424
2425 Methods may call other methods by using method attributes of the
2426 \verb\self\ argument, e.g.:
2427
2428 \begin{verbatim}
2429         class Bag:
2430                 def empty(self):
2431                         self.data = []
2432                 def add(self, x):
2433                         self.data.append(x)
2434                 def addtwice(self, x):
2435                         self.add(x)
2436                         self.add(x)
2437 \end{verbatim}
2438
2439
2440 The instantiation operation (``calling'' a class object) creates an
2441 empty object.  Many classes like to create objects in a known initial
2442 state.  Therefore a class may define a special method named
2443 \verb\__init__\, like this:
2444
2445 \begin{verbatim}
2446                 def __init__(self):
2447                         self.empty()
2448 \end{verbatim}
2449
2450 When a class defines an \verb\__init__\ method, class instantiation
2451 automatically invokes \verb\__init__\ for the newly-created class
2452 instance.  So in the \verb\Bag\ example, a new and initialized instance
2453 can be obtained by:
2454
2455 \begin{verbatim}
2456         x = Bag()
2457 \end{verbatim}
2458
2459 Of course, the \verb\__init__\ method may have arguments for greater
2460 flexibility.  In that case, arguments given to the class instantiation
2461 operator are passed on to \verb\__init__\.  For example,
2462
2463 \bcode\begin{verbatim}
2464 >>> class Complex:
2465 ...     def __init__(self, realpart, imagpart):
2466 ...         self.r = realpart
2467 ...         self.i = imagpart
2468 ...
2469 >>> x = Complex(3.0,-4.5)
2470 >>> x.r, x.i
2471 (3.0, -4.5)
2472 >>>
2473 \end{verbatim}\ecode
2474 %
2475 Methods may reference global names in the same way as ordinary
2476 functions.  The global scope associated with a method is the module
2477 containing the class definition.  (The class itself is never used as a
2478 global scope!)  While one rarely encounters a good reason for using
2479 global data in a method, there are many legitimate uses of the global
2480 scope: for one thing, functions and modules imported into the global
2481 scope can be used by methods, as well as functions and classes defined
2482 in it.  Usually, the class containing the method is itself defined in
2483 this global scope, and in the next section we'll find some good
2484 reasons why a method would want to reference its own class!
2485
2486
2487 \section{Inheritance}
2488
2489 Of course, a language feature would not be worthy of the name ``class''
2490 without supporting inheritance.  The syntax for a derived class
2491 definition looks as follows:
2492
2493 \begin{verbatim}
2494         class DerivedClassName(BaseClassName):
2495                 <statement-1>
2496                 .
2497                 .
2498                 .
2499                 <statement-N>
2500 \end{verbatim}
2501
2502 The name \verb\BaseClassName\ must be defined in a scope containing
2503 the derived class definition.  Instead of a base class name, an
2504 expression is also allowed.  This is useful when the base class is
2505 defined in another module, e.g.,
2506
2507 \begin{verbatim}
2508         class DerivedClassName(modname.BaseClassName):
2509 \end{verbatim}
2510
2511 Execution of a derived class definition proceeds the same as for a
2512 base class.  When the class object is constructed, the base class is
2513 remembered.  This is used for resolving attribute references: if a
2514 requested attribute is not found in the class, it is searched in the
2515 base class.  This rule is applied recursively if the base class itself
2516 is derived from some other class.
2517
2518 There's nothing special about instantiation of derived classes:
2519 \verb\DerivedClassName()\ creates a new instance of the class.  Method
2520 references are resolved as follows: the corresponding class attribute
2521 is searched, descending down the chain of base classes if necessary,
2522 and the method reference is valid if this yields a function object.
2523
2524 Derived classes may override methods of their base classes.  Because
2525 methods have no special privileges when calling other methods of the
2526 same object, a method of a base class that calls another method
2527 defined in the same base class, may in fact end up calling a method of
2528 a derived class that overrides it.  (For \Cpp{} programmers: all methods
2529 in Python are ``virtual functions''.)
2530
2531 An overriding method in a derived class may in fact want to extend
2532 rather than simply replace the base class method of the same name.
2533 There is a simple way to call the base class method directly: just
2534 call \verb\BaseClassName.methodname(self, arguments)\.  This is
2535 occasionally useful to clients as well.  (Note that this only works if
2536 the base class is defined or imported directly in the global scope.)
2537
2538
2539 \subsection{Multiple inheritance}
2540
2541 Python supports a limited form of multiple inheritance as well.  A
2542 class definition with multiple base classes looks as follows:
2543
2544 \begin{verbatim}
2545         class DerivedClassName(Base1, Base2, Base3):
2546                 <statement-1>
2547                 .
2548                 .
2549                 .
2550                 <statement-N>
2551 \end{verbatim}
2552
2553 The only rule necessary to explain the semantics is the resolution
2554 rule used for class attribute references.  This is depth-first,
2555 left-to-right.  Thus, if an attribute is not found in
2556 \verb\DerivedClassName\, it is searched in \verb\Base1\, then
2557 (recursively) in the base classes of \verb\Base1\, and only if it is
2558 not found there, it is searched in \verb\Base2\, and so on.
2559
2560 (To some people breadth first---searching \verb\Base2\ and
2561 \verb\Base3\ before the base classes of \verb\Base1\---looks more
2562 natural.  However, this would require you to know whether a particular
2563 attribute of \verb\Base1\ is actually defined in \verb\Base1\ or in
2564 one of its base classes before you can figure out the consequences of
2565 a name conflict with an attribute of \verb\Base2\.  The depth-first
2566 rule makes no differences between direct and inherited attributes of
2567 \verb\Base1\.)
2568
2569 It is clear that indiscriminate use of multiple inheritance is a
2570 maintenance nightmare, given the reliance in Python on conventions to
2571 avoid accidental name conflicts.  A well-known problem with multiple
2572 inheritance is a class derived from two classes that happen to have a
2573 common base class.  While it is easy enough to figure out what happens
2574 in this case (the instance will have a single copy of ``instance
2575 variables'' or data attributes used by the common base class), it is
2576 not clear that these semantics are in any way useful.
2577
2578
2579 \section{Odds and ends}
2580
2581 Sometimes it is useful to have a data type similar to the Pascal
2582 ``record'' or C ``struct'', bundling together a couple of named data
2583 items.  An empty class definition will do nicely, e.g.:
2584
2585 \begin{verbatim}
2586         class Employee:
2587                 pass
2588
2589         john = Employee() # Create an empty employee record
2590
2591         # Fill the fields of the record
2592         john.name = 'John Doe'
2593         john.dept = 'computer lab'
2594         john.salary = 1000
2595 \end{verbatim}
2596
2597
2598 A piece of Python code that expects a particular abstract data type
2599 can often be passed a class that emulates the methods of that data
2600 type instead.  For instance, if you have a function that formats some
2601 data from a file object, you can define a class with methods
2602 \verb\read()\ and \verb\readline()\ that gets the data from a string
2603 buffer instead, and pass it as an argument.  (Unfortunately, this
2604 technique has its limitations: a class can't define operations that
2605 are accessed by special syntax such as sequence subscripting or
2606 arithmetic operators, and assigning such a ``pseudo-file'' to
2607 \verb\sys.stdin\ will not cause the interpreter to read further input
2608 from it.)
2609
2610
2611 Instance method objects have attributes, too: \verb\m.im_self\ is the
2612 object of which the method is an instance, and \verb\m.im_func\ is the
2613 function object corresponding to the method.
2614
2615
2616 \chapter{Recent Additions}
2617
2618 Python is an evolving language.  Since this tutorial was last
2619 thoroughly revised, several new features have been added to the
2620 language.  While ideally I should revise the tutorial to incorporate
2621 them in the mainline of the text, lack of time currently requires me
2622 to take a more modest approach.  In this chapter I will briefly list the
2623 most important improvements to the language and how you can use them
2624 to your benefit.
2625
2626 \section{The Last Printed Expression}
2627
2628 In interactive mode, the last printed expression is assigned to the
2629 variable \code{_}.  This means that when you are using Python as a
2630 desk calculator, it is somewhat easier to continue calculations, for
2631 example:
2632
2633 \begin{verbatim}
2634         >>> tax = 17.5 / 100
2635         >>> price = 3.50
2636         >>> price * tax
2637         0.6125
2638         >>> price + _
2639         4.1125
2640         >>> round(_, 2)
2641         4.11
2642         >>>
2643 \end{verbatim}
2644
2645 For reasons too embarrassing to explain, this variable is implemented
2646 as a built-in (living in the module \code{__builtin__}), so it should
2647 be treated as read-only by the user.  I.e. don't explicitly assign a
2648 value to it --- you would create an independent local variable with
2649 the same name masking the built-in variable with its magic behavior.
2650
2651 \section{String Literals}
2652
2653 \subsection{Double Quotes}
2654
2655 Python can now also use double quotes to surround string literals,
2656 e.g. \verb\"this doesn't hurt a bit"\.  There is no semantic
2657 difference between strings surrounded by single or double quotes.
2658
2659 \subsection{Continuation Of String Literals}
2660
2661 String literals can span multiple lines by escaping newlines with
2662 backslashes, e.g.
2663
2664 \begin{verbatim}
2665         hello = "This is a rather long string containing\n\
2666         several lines of text just as you would do in C.\n\
2667             Note that whitespace at the beginning of the line is\
2668          significant.\n"
2669         print hello
2670 \end{verbatim}
2671
2672 which would print the following:
2673 \begin{verbatim}
2674         This is a rather long string containing
2675         several lines of text just as you would do in C.
2676             Note that whitespace at the beginning of the line is significant.
2677 \end{verbatim}
2678
2679 \subsection{Triple-quoted strings}
2680
2681 In some cases, when you need to include really long strings (e.g.
2682 containing several paragraphs of informational text), it is annoying
2683 that you have to terminate each line with \verb@\n\@, especially if
2684 you would like to reformat the text occasionally with a powerful text
2685 editor like Emacs.  For such situations, ``triple-quoted'' strings can
2686 be used, e.g.
2687
2688 \begin{verbatim}
2689         hello = """
2690
2691             This string is bounded by triple double quotes (3 times ").
2692         Unescaped newlines in the string are retained, though \
2693         it is still possible\nto use all normal escape sequences.
2694
2695             Whitespace at the beginning of a line is
2696         significant.  If you need to include three opening quotes
2697         you have to escape at least one of them, e.g. \""".
2698
2699             This string ends in a newline.
2700         """
2701 \end{verbatim}
2702
2703 Triple-quoted strings can be surrounded by three single quotes as
2704 well, again without semantic difference.
2705
2706 \subsection{String Literal Juxtaposition}
2707
2708 One final twist: you can juxtapose multiple string literals.  Two or
2709 more adjacent string literals (but not arbitrary expressions!)
2710 separated only by whitespace will be concatenated (without intervening
2711 whitespace) into a single string object at compile time.  This makes
2712 it possible to continue a long string on the next line without
2713 sacrificing indentation or performance, unlike the use of the string
2714 concatenation operator \verb\+\ or the continuation of the literal
2715 itself on the next line (since leading whitespace is significant
2716 inside all types of string literals).  Note that this feature, like
2717 all string features except triple-quoted strings, is borrowed from
2718 Standard C.
2719
2720 \section{The Formatting Operator}
2721
2722 \subsection{Basic Usage}
2723
2724 The chapter on output formatting is really out of date: there is now
2725 an almost complete interface to C-style printf formats.  This is done
2726 by overloading the modulo operator (\verb\%\) for a left operand
2727 which is a string, e.g.
2728
2729 \begin{verbatim}
2730         >>> import math
2731         >>> print 'The value of PI is approximately %5.3f.' % math.pi
2732         The value of PI is approximately 3.142.
2733         >>>
2734 \end{verbatim}
2735
2736 If there is more than one format in the string you pass a tuple as
2737 right operand, e.g.
2738
2739 \begin{verbatim}
2740         >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
2741         >>> for name, phone in table.items():
2742         ...     print '%-10s ==> %10d' % (name, phone)
2743         ...
2744         Jack       ==>       4098
2745         Dcab       ==>    8637678
2746         Sjoerd     ==>       4127
2747         >>>
2748 \end{verbatim}
2749
2750 Most formats work exactly as in C and require that you pass the proper
2751 type (however, if you don't you get an exception, not a core dump).
2752 The \verb\%s\ format is more relaxed: if the corresponding argument is
2753 not a string object, it is converted to string using the \verb\str()\
2754 built-in function.  Using \verb\*\ to pass the width or precision in
2755 as a separate (integer) argument is supported.  The C formats
2756 \verb\%n\ and \verb\%p\ are not supported.
2757
2758 \subsection{Referencing Variables By Name}
2759
2760 If you have a really long format string that you don't want to split
2761 up, it would be nice if you could reference the variables to be
2762 formatted by name instead of by position.  This can be done by using
2763 an extension of C formats using the form \verb\%(name)format\, e.g.
2764
2765 \begin{verbatim}
2766         >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
2767         >>> print 'Jack: %(Jack)d; Sjoerd: %(Sjoerd)d; Dcab: %(Dcab)d' % table
2768         Jack: 4098; Sjoerd: 4127; Dcab: 8637678
2769         >>>
2770 \end{verbatim}
2771
2772 This is particularly useful in combination with the new built-in
2773 \verb\vars()\ function, which returns a dictionary containing all
2774 local variables.
2775
2776 \section{Optional Function Arguments}
2777
2778 It is now possible to define functions with a variable number of
2779 arguments.  There are two forms, which can be combined.
2780
2781 \subsection{Default Argument Values}
2782
2783 The most useful form is to specify a default value for one or more
2784 arguments.  This creates a function that can be called with fewer
2785 arguments than it is defined, e.g.
2786
2787 \begin{verbatim}
2788         def ask_ok(prompt, retries = 4, complaint = 'Yes or no, please!'):
2789                 while 1:
2790                         ok = raw_input(prompt)
2791                         if ok in ('y', 'ye', 'yes'): return 1
2792                         if ok in ('n', 'no', 'nop', 'nope'): return 0
2793                         retries = retries - 1
2794                         if retries < 0: raise IOError, 'refusenik user'
2795                         print complaint
2796 \end{verbatim}
2797
2798 This function can be called either like this:
2799 \verb\ask_ok('Do you really want to quit?')\ or like this:
2800 \verb\ask_ok('OK to overwrite the file?', 2)\.
2801
2802 The default values are evaluated at the point of function definition
2803 in the {\em defining} scope, so that e.g.
2804
2805 \begin{verbatim}
2806         i = 5
2807         def f(arg = i): print arg
2808         i = 6
2809         f()
2810 \end{verbatim}
2811
2812 will print \verb\5\.
2813
2814 \subsection{Arbitrary Argument Lists}
2815
2816 It is also possible to specify that a function can be called with an
2817 arbitrary number of arguments.  These arguments will be wrapped up in
2818 a tuple.  Before the variable number of arguments, zero or more normal
2819 arguments may occur, e.g.
2820
2821 \begin{verbatim}
2822         def fprintf(file, format, *args):
2823                 file.write(format % args)
2824 \end{verbatim}
2825
2826 This feature may be combined with the previous, e.g.
2827
2828 \begin{verbatim}
2829         def but_is_it_useful(required, optional = None, *remains):
2830                 print "I don't know"
2831 \end{verbatim}
2832
2833 \section{Lambda And Functional Programming Tools}
2834
2835 \subsection{Lambda Forms}
2836
2837 By popular demand, a few features commonly found in functional
2838 programming languages and Lisp have been added to Python.  With the
2839 \verb\lambda\ keyword, small anonymous functions can be created.
2840 Here's a function that returns the sum of its two arguments:
2841 \verb\lambda a, b: a+b\.  Lambda forms can be used wherever function
2842 objects are required.  They are syntactically restricted to a single
2843 expression.  Semantically, they are just syntactic sugar for a normal
2844 function definition.  Like nested function definitions, lambda forms
2845 cannot reference variables from the containing scope, but this can be
2846 overcome through the judicious use of default argument values, e.g.
2847
2848 \begin{verbatim}
2849         def make_incrementor(n):
2850                 return lambda x, incr=n: x+incr
2851 \end{verbatim}
2852
2853 \subsection{Map, Reduce and Filter}
2854
2855 Three new built-in functions on sequences are good candidate to pass
2856 lambda forms.
2857
2858 \subsubsection{Map.}
2859
2860 \verb\map(function, sequence)\ calls \verb\function(item)\ for each of
2861 the sequence's items and returns a list of the return values.  For
2862 example, to compute some cubes:
2863
2864 \begin{verbatim}
2865         >>> map(lambda x: x*x*x, range(1, 11))
2866         [1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
2867         >>>
2868 \end{verbatim}
2869
2870 More than one sequence may be passed; the function must then have as
2871 many arguments as there are sequences and is called with the
2872 corresponding item from each sequence (or \verb\None\ if some sequence
2873 is shorter than another).  If \verb\None\ is passed for the function,
2874 a function returning its argument(s) is substituted.
2875
2876 Combining these two special cases, we see that
2877 \verb\map(None, list1, list2)\  is a convenient way of turning a pair
2878 of lists into a list of pairs.  For example:
2879
2880 \begin{verbatim}
2881         >>> seq = range(8)
2882         >>> map(None, seq, map(lambda x: x*x, seq))
2883         [(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49)]
2884         >>>
2885 \end{verbatim}
2886
2887 \subsubsection{Filter.}
2888
2889 \verb\filter(function, sequence)\ returns a sequence (of the same
2890 type, if possible) consisting of those items from the sequence for
2891 which \verb\function(item)\ is true.  For example, to compute some
2892 primes:
2893
2894 \begin{verbatim}
2895         >>> filter(lambda x: x%2 != 0 and x%3 != 0, range(2, 25))
2896         [5, 7, 11, 13, 17, 19, 23]
2897         >>>
2898 \end{verbatim}
2899
2900 \subsubsection{Reduce.}
2901
2902 \verb\reduce(function, sequence)\ returns a single value constructed
2903 by calling the (binary) function on the first two items of the
2904 sequence, then on the result and the next item, and so on.  For
2905 example, to compute the sum of the numbers 1 through 10:
2906
2907 \begin{verbatim}
2908         >>> reduce(lambda x, y: x+y, range(1, 11))
2909         55
2910         >>>
2911 \end{verbatim}
2912
2913 If there's only one item in the sequence, its value is returned; if
2914 the sequence is empty, an exception is raised.
2915
2916 A third argument can be passed to indicate the starting value.  In this
2917 case the starting value is returned for an empty sequence, and the
2918 function is first applied to the starting value and the first sequence
2919 item, then to the result and the next item, and so on.  For example,
2920
2921 \begin{verbatim}
2922         >>> def sum(seq):
2923         ...     return reduce(lambda x, y: x+y, seq, 0)
2924         ...
2925         >>> sum(range(1, 11))
2926         55
2927         >>> sum([])
2928         0
2929         >>>
2930 \end{verbatim}
2931
2932 \section{Continuation Lines Without Backslashes}
2933
2934 While the general mechanism for continuation of a source line on the
2935 next physical line remains to place a backslash on the end of the
2936 line, expressions inside matched parentheses (or square brackets, or
2937 curly braces) can now also be continued without using a backslash.
2938 This is particularly useful for calls to functions with many
2939 arguments, and for initializations of large tables.
2940
2941 For example:
2942
2943 \begin{verbatim}
2944         month_names = ['Januari', 'Februari', 'Maart',
2945                        'April',   'Mei',      'Juni',
2946                        'Juli',    'Augustus', 'September',
2947                        'Oktober', 'November', 'December']
2948 \end{verbatim}
2949
2950 and
2951
2952 \begin{verbatim}
2953         CopyInternalHyperLinks(self.context.hyperlinks,
2954                                copy.context.hyperlinks,
2955                                uidremap)
2956 \end{verbatim}
2957
2958 \section{Regular Expressions}
2959
2960 While C's printf-style output formats, transformed into Python, are
2961 adequate for most output formatting jobs, C's scanf-style input
2962 formats are not very powerful.  Instead of scanf-style input, Python
2963 offers Emacs-style regular expressions as a powerful input and
2964 scanning mechanism.  Read the corresponding section in the Library
2965 Reference for a full description.
2966
2967 \section{Generalized Dictionaries}
2968
2969 The keys of dictionaries are no longer restricted to strings --- they
2970 can be any immutable basic type including strings, numbers, tuples, or
2971 (certain) class instances.  (Lists and dictionaries are not acceptable
2972 as dictionary keys, in order to avoid problems when the object used as
2973 a key is modified.)
2974
2975 Dictionaries have two new methods: \verb\d.values()\ returns a list of
2976 the dictionary's values, and \verb\d.items()\ returns a list of the
2977 dictionary's (key, value) pairs.  Like \verb\d.keys()\, these
2978 operations are slow for large dictionaries.  Examples:
2979
2980 \begin{verbatim}
2981         >>> d = {100: 'honderd', 1000: 'duizend', 10: 'tien'}
2982         >>> d.keys()
2983         [100, 10, 1000]
2984         >>> d.values()
2985         ['honderd', 'tien', 'duizend']
2986         >>> d.items()
2987         [(100, 'honderd'), (10, 'tien'), (1000, 'duizend')]
2988         >>>
2989 \end{verbatim}
2990
2991 \section{Miscellaneous New Built-in Functions}
2992
2993 The function \verb\vars()\ returns a dictionary containing the current
2994 local variables.  With a module argument, it returns that module's
2995 global variables.  The old function \verb\dir(x)\ returns
2996 \verb\vars(x).keys()\.
2997
2998 The function \verb\round(x)\ returns a floating point number rounded
2999 to the nearest integer (but still expressed as a floating point
3000 number).  E.g. \verb\round(3.4) == 3.0\ and \verb\round(3.5) == 4.0\.
3001 With a second argument it rounds to the specified number of digits,
3002 e.g. \verb\round(math.pi, 4) == 3.1416\ or even
3003 \verb\round(123.4, -2) == 100.0\.
3004
3005 The function \verb\hash(x)\ returns a hash value for an object.
3006 All object types acceptable as dictionary keys have a hash value (and
3007 it is this hash value that the dictionary implementation uses).
3008
3009 The function \verb\id(x)\ return a unique identifier for an object.
3010 For two objects x and y, \verb\id(x) == id(y)\ if and only if
3011 \verb\x is y\.  (In fact the object's address is used.)
3012
3013 The function \verb\hasattr(x, name)\ returns whether an object has an
3014 attribute with the given name (a string value).  The function
3015 \verb\getattr(x, name)\ returns the object's attribute with the given
3016 name.  The function \verb\setattr(x, name, value)\ assigns a value to
3017 an object's attribute with the given name.  These three functions are
3018 useful if the attribute names are not known beforehand.  Note that
3019 \verb\getattr(x, 'spam')\ is equivalent to \verb\x.spam\, and
3020 \verb\setattr(x, 'spam', y)\ is equivalent to \verb\x.spam = y\.  By
3021 definition, \verb\hasattr(x, name)\ returns true if and only if
3022 \verb\getattr(x, name)\ returns without raising an exception.
3023
3024 \section{Else Clause For Try Statement}
3025
3026 The \verb\try...except\ statement now has an optional \verb\else\
3027 clause, which must follow all \verb\except\ clauses.  It is useful to
3028 place code that must be executed if the \verb\try\ clause does not
3029 raise an exception.  For example:
3030
3031 \begin{verbatim}
3032         for arg in sys.argv:
3033                 try:
3034                         f = open(arg, 'r')
3035                 except IOError:
3036                         print 'cannot open', arg
3037                 else:
3038                         print arg, 'has', len(f.readlines()), 'lines'
3039                         f.close()
3040 \end{verbatim}
3041
3042
3043 \section{New Class Features in Release 1.1}
3044
3045 Some changes have been made to classes: the operator overloading
3046 mechanism is more flexible, providing more support for non-numeric use
3047 of operators (including calling an object as if it were a function),
3048 and it is possible to trap attribute accesses.
3049
3050 \subsection{New Operator Overloading}
3051
3052 It is no longer necessary to coerce both sides of an operator to the
3053 same class or type.  A class may still provide a \code{__coerce__}
3054 method, but this method may return objects of different types or
3055 classes if it feels like it.  If no \code{__coerce__} is defined, any
3056 argument type or class is acceptable.
3057
3058 In order to make it possible to implement binary operators where the
3059 right-hand side is a class instance but the left-hand side is not,
3060 without using coercions, right-hand versions of all binary operators
3061 may be defined.  These have an `r' prepended to their name,
3062 e.g. \code{__radd__}.
3063
3064 For example, here's a very simple class for representing times.  Times
3065 are initialized from a number of seconds (like time.time()).  Times
3066 are printed like this: \code{Wed Mar 15 12:28:48 1995}.  Subtracting
3067 two Times gives their difference in seconds.  Adding or subtracting a
3068 Time and a number gives a new Time.  You can't add two times, nor can
3069 you subtract a Time from a number.
3070
3071 \begin{verbatim}
3072 import time
3073
3074 class Time:
3075     def __init__(self, seconds):
3076         self.seconds = seconds
3077     def __repr__(self):
3078         return time.ctime(self.seconds)
3079     def __add__(self, x):
3080         return Time(self.seconds + x)
3081     __radd__ = __add__            # support for x+t
3082     def __sub__(self, x):
3083         if hasattr(x, 'seconds'): # test if x could be a Time
3084             return self.seconds - x.seconds
3085         else:
3086             return self.seconds - x
3087
3088 now = Time(time.time())
3089 tomorrow = 24*3600 + now
3090 yesterday = now - today
3091 print tomorrow - yesterday        # prints 172800
3092 \end{verbatim}
3093
3094 \subsection{Trapping Attribute Access}
3095
3096 You can define three new ``magic'' methods in a class now:
3097 \code{__getattr__(self, name)}, \code{__setattr__(self, name, value)}
3098 and \code{__delattr__(self, name)}.
3099
3100 The \code{__getattr__} method is called when an attribute access fails,
3101 i.e. when an attribute access would otherwise raise AttributeError ---
3102 this is {\em after} the instance's dictionary and its class hierarchy
3103 have been searched for the named attribute.  Note that if this method
3104 attempts to access any undefined instance attribute it will be called
3105 recursively!
3106
3107 The \code{__setattr__} and \code{__delattr__} methods are called when
3108 assignment to, respectively deletion of an attribute are attempted.
3109 They are called {\em instead} of the normal action (which is to insert
3110 or delete the attribute in the instance dictionary).  If either of
3111 these methods most set or delete any attribute, they can only do so by
3112 using the instance dictionary directly --- \code{self.__dict__} --- else
3113 they would be called recursively.
3114
3115 For example, here's a near-universal ``Wrapper'' class that passes all
3116 its attribute accesses to another object.  Note how the
3117 \code{__init__} method inserts the wrapped object in
3118 \code{self.__dict__} in order to avoid endless recursion
3119 (\code{__setattr__} would call \code{__getattr__} which would call
3120 itself recursively).
3121
3122 \begin{verbatim}
3123 class Wrapper:
3124     def __init__(self, wrapped):
3125         self.__dict__['wrapped'] = wrapped
3126     def __getattr__(self, name):
3127         return getattr(self.wrapped, name)
3128     def __setattr__(self, name, value):
3129         setattr(self.wrapped, name, value)
3130     def __delattr__(self, name):
3131         delattr(self.wrapped, name)
3132
3133 import sys
3134 f = Wrapper(sys.stdout)
3135 f.write('hello world\n')          # prints 'hello world'
3136 \end{verbatim}
3137
3138 A simpler example of \code{__getattr__} is an attribute that is
3139 computed each time (or the first time) it it accessed.  For instance:
3140
3141 \begin{verbatim}
3142 from math import pi
3143
3144 class Circle:
3145     def __init__(self, radius):
3146         self.radius = radius
3147     def __getattr__(self, name):
3148         if name == 'circumference':
3149             return 2 * pi * self.radius
3150         if name == 'diameter':
3151             return 2 * self.radius
3152         if name == 'area':
3153            return pi * pow(self.radius, 2)
3154         raise AttributeError, name
3155 \end{verbatim}
3156
3157 \subsection{Calling a Class Instance}
3158
3159 If a class defines a method \code{__call__} it is possible to call its
3160 instances as if they were functions.  For example:
3161
3162 \begin{verbatim}
3163 class PresetSomeArguments:
3164     def __init__(self, func, *args):
3165         self.func, self.args = func, args
3166     def __call__(self, *args):
3167         return apply(self.func, self.args + args)
3168
3169 f = PresetSomeArguments(pow, 2)    # f(i) computes powers of 2
3170 for i in range(10): print f(i),    # prints 1 2 4 8 16 32 64 128 256 512
3171 print                              # append newline
3172 \end{verbatim}
3173
3174
3175 \chapter{New in Release 1.2}
3176
3177
3178 This chapter describes even more recent additions to the Python
3179 language and library.
3180
3181
3182 \section{New Class Features}
3183
3184 The semantics of \code{__coerce__} have been changed to be more
3185 reasonable.  As an example, the new standard module \code{Complex}
3186 implements fairly complete complex numbers using this.  Additional
3187 examples of classes with and without \code{__coerce__} methods can be
3188 found in the \code{Demo/classes} subdirectory, modules \code{Rat} and
3189 \code{Dates}.
3190
3191 If a class defines no \code{__coerce__} method, this is equivalent to
3192 the following definition:
3193
3194 \begin{verbatim}
3195 def __coerce__(self, other): return self, other
3196 \end{verbatim}
3197
3198 If \code{__coerce__} coerces itself to an object of a different type,
3199 the operation is carried out using that type --- in release 1.1, this
3200 would cause an error.
3201
3202 Comparisons involving class instances now invoke \code{__coerce__}
3203 exactly as if \code{cmp(x, y)} were a binary operator like \code{+}
3204 (except if \code{x} and \code{y} are the same object).
3205
3206 \section{Unix Signal Handling}
3207
3208 On Unix, Python now supports signal handling.  The module
3209 \code{signal} exports functions \code{signal}, \code{pause} and
3210 \code{alarm}, which act similar to their Unix counterparts.  The
3211 module also exports the conventional names for the various signal
3212 classes (also usable with \code{os.kill()}) and \code{SIG_IGN} and
3213 \code{SIG_DFL}.  See the section on \code{signal} in the Library
3214 Reference Manual for more information.
3215
3216 \section{Exceptions Can Be Classes}
3217
3218 User-defined exceptions are no longer limited to being string objects
3219 --- they can be identified by classes as well.  Using this mechanism it
3220 is possible to create extensible hierarchies of exceptions.
3221
3222 There are two new valid (semantic) forms for the raise statement:
3223
3224 \begin{verbatim}
3225 raise Class, instance
3226
3227 raise instance
3228 \end{verbatim}
3229
3230 In the first form, \code{instance} must be an instance of \code{Class}
3231 or of a class derived from it.  The second form is a shorthand for
3232
3233 \begin{verbatim}
3234 raise instance.__class__, instance
3235 \end{verbatim}
3236
3237 An except clause may list classes as well as string objects.  A class
3238 in an except clause is compatible with an exception if it is the same
3239 class or a base class thereof (but not the other way around --- an
3240 except clause listing a derived class is not compatible with a base
3241 class).  For example, the following code will print B, C, D in that
3242 order:
3243
3244 \begin{verbatim}
3245 class B:
3246     pass
3247 class C(B):
3248     pass
3249 class D(C):
3250     pass
3251
3252 for c in [B, C, D]:
3253     try:
3254         raise c()
3255     except D:
3256         print "D"
3257     except C:
3258         print "C"
3259     except B:
3260         print "B"
3261 \end{verbatim}
3262
3263 Note that if the except clauses were reversed (with ``\code{except B}''
3264 first), it would have printed B, B, B --- the first matching except
3265 clause is triggered.
3266
3267 When an error message is printed for an unhandled exception which is a
3268 class, the class name is printed, then a colon and a space, and
3269 finally the instance converted to a string using the built-in function
3270 \code{str()}.
3271
3272 In this release, the built-in exceptions are still strings.
3273
3274
3275 \section{Object Persistency and Object Copying}
3276
3277 Two new modules, \code{pickle} and \code{shelve}, support storage and
3278 retrieval of (almost) arbitrary Python objects on disk, using the
3279 \code{dbm} package.  A third module, \code{copy}, provides flexible
3280 object copying operations.  More information on these modules is
3281 provided in the Library Reference Manual.
3282
3283 \subsection{Persistent Objects}
3284
3285 The module \code{pickle} provides a general framework for objects to
3286 disassemble themselves into a stream of bytes and to reassemble such a
3287 stream back into an object.  It copes with reference sharing,
3288 recursive objects and instances of user-defined classes, but not
3289 (directly) with objects that have ``magical'' links into the operating
3290 system such as open files, sockets or windows.
3291
3292 The \code{pickle} module defines a simple protocol whereby
3293 user-defined classes can control how they are disassembled and
3294 assembled.  The method \code{__getinitargs__()}, if defined, returns
3295 the argument list for the constructor to be used at assembly time (by
3296 default the constructor is called without arguments).  The methods
3297 \code{__getstate__()} and \code{__setstate__()} are used to pass
3298 additional state from disassembly to assembly; by default the
3299 instance's \code{__dict__} is passed and restored.
3300
3301 Note that \code{pickle} does not open or close any files --- it can be
3302 used equally well for moving objects around on a network or store them
3303 in a database.  For ease of debugging, and the inevitable occasional
3304 manual patch-up, the constructed byte streams consist of printable
3305 \ASCII{} characters only (though it's not designed to be pretty).
3306
3307 The module \code{shelve} provides a simple model for storing objects
3308 on files.  The operation \code{shelve.open(filename)} returns a
3309 ``shelf'', which is a simple persistent database with a
3310 dictionary-like interface.  Database keys are strings, objects stored
3311 in the database can be anything that \code{pickle} will handle.
3312
3313 \subsection{Copying Objects}
3314
3315 The module \code{copy} exports two functions: \code{copy()} and
3316 \code{deepcopy()}.  The \code{copy()} function returns a ``shallow''
3317 copy of an object; \code{deepcopy()} returns a ``deep'' copy.  The
3318 difference between shallow and deep copying is only relevant for
3319 compound objects (objects that contain other objects, like lists or
3320 class instances):
3321
3322 \begin{itemize}
3323
3324 \item
3325 A shallow copy constructs a new compound object and then (to the
3326 extent possible) inserts {\em the same objects} into in that the
3327 original contains.
3328
3329 \item
3330 A deep copy constructs a new compound object and then, recursively,
3331 inserts {\em copies} into it of the objects found in the original.
3332
3333 \end{itemize}
3334
3335 Both functions have the same restrictions and use the same protocols
3336 as \code{pickle} --- user-defined classes can control how they are
3337 copied by providing methods named \code{__getinitargs__()},
3338 \code{__getstate__()} and \code{__setstate__()}.
3339
3340
3341 \section{Documentation Strings}
3342
3343 A variety of objects now have a new attribute, \code{__doc__}, which
3344 is supposed to contain a documentation string (if no documentation is
3345 present, the attribute is \code{None}).  New syntax, compatible with
3346 the old interpreter, allows for convenient initialization of the
3347 \code{__doc__} attribute of modules, classes and functions by placing
3348 a string literal by itself as the first statement in the suite.  It
3349 must be a literal --- an expression yielding a string object is not
3350 accepted as a documentation string, since future tools may need to
3351 derive documentation from source by parsing.
3352
3353 Here is a hypothetical, amply documented module called \code{Spam}:
3354
3355 \begin{verbatim}
3356 """Spam operations.
3357
3358 This module exports two classes, a function and an exception:
3359
3360 class Spam: full Spam functionality --- three can sizes
3361 class SpamLight: limited Spam functionality --- only one can size
3362
3363 def open(filename): open a file and return a corresponding Spam or
3364 SpamLight object
3365
3366 GoneOff: exception raised for errors; should never happen
3367
3368 Note that it is always possible to convert a SpamLight object to a
3369 Spam object by a simple method call, but that the reverse operation is
3370 generally costly and may fail for a number of reasons.
3371 """
3372
3373 class SpamLight:
3374     """Limited spam functionality.
3375
3376     Supports a single can size, no flavor, and only hard disks.
3377     """
3378
3379     def __init__(self, size=12):
3380         """Construct a new SpamLight instance.
3381
3382         Argument is the can size.
3383         """
3384         # etc.
3385
3386     # etc.
3387
3388 class Spam(SpamLight):
3389     """Full spam functionality.
3390
3391     Supports three can sizes, two flavor varieties, and all floppy
3392     disk formats still supported by current hardware.
3393     """
3394
3395     def __init__(self, size1=8, size2=12, size3=20):
3396         """Construct a new Spam instance.
3397
3398         Arguments are up to three can sizes.
3399         """
3400         # etc.
3401
3402     # etc.
3403
3404 def open(filename = "/dev/null"):
3405     """Open a can of Spam.
3406
3407     Argument must be an existing file.
3408     """
3409     # etc.
3410
3411 class GoneOff:
3412     """Class used for Spam exceptions.
3413
3414     There shouldn't be any.
3415     """
3416     pass
3417 \end{verbatim}
3418
3419 After executing ``\code{import Spam}'', the following expressions
3420 return the various documentation strings from the module:
3421
3422 \begin{verbatim}
3423 Spam.__doc__
3424 Spam.SpamLight.__doc__
3425 Spam.SpamLight.__init__.__doc__
3426 Spam.Spam.__doc__
3427 Spam.Spam.__init__.__doc__
3428 Spam.open.__doc__
3429 Spam.GoneOff.__doc__
3430 \end{verbatim}
3431
3432 There are emerging conventions about the content and formatting of
3433 documentation strings.
3434
3435 The first line should always be a short, concise summary of the
3436 object's purpose.  For brevity, it should not explicitly state the
3437 object's name or type, since these are available by other means
3438 (except if the name happens to be a verb describing a function's
3439 operation).  This line should begin with a capital letter and end with
3440 a period.
3441
3442 If there are more lines in the documentation string, the second line
3443 should be blank, visually separating the summary from the rest of the
3444 description.  The following lines should be one of more of paragraphs
3445 describing the objects calling conventions, its side effects, etc.
3446
3447 Some people like to copy the Emacs convention of using UPPER CASE for
3448 function parameters --- this often saves a few words or lines.
3449
3450 The Python parser does not strip indentation from multi-line string
3451 literals in Python, so tools that process documentation have to strip
3452 indentation.  This is done using the following convention.  The first
3453 non-blank line {\em after} the first line of the string determines the
3454 amount of indentation for the entire documentation string.  (We can't
3455 use the first line since it is generally adjacent to the string's
3456 opening quotes so its indentation is not apparent in the string
3457 literal.)  Whitespace ``equivalent'' to this indentation is then
3458 stripped from the start of all lines of the string.  Lines that are
3459 indented less should not occur, but if they occur all their leading
3460 whitespace should be stripped.  Equivalence of whitespace should be
3461 tested after expansion of tabs (to 8 spaces, normally).
3462
3463 In this release, few of the built-in or standard functions and modules
3464 have documentation strings.
3465
3466
3467 \section{Customizing Import and Built-Ins}
3468
3469 In preparation for a ``restricted execution mode'' which will be
3470 usable to run code received from an untrusted source (such as a WWW
3471 server or client), the mechanism by which modules are imported has
3472 been redesigned.  It is now possible to provide your own function
3473 \code{__import__} which is called whenever an \code{import} statement
3474 is executed.  There's a built-in function \code{__import__} which
3475 provides the default implementation, but more interesting, the various
3476 steps it takes are available separately from the new built-in module
3477 \code{imp}.  (See the section on \code{imp} in the Library Reference
3478 Manual for more information on this module -- it also contains a
3479 complete example of how to write your own \code{__import__} function.)
3480
3481 When you do \code{dir()} in a fresh interactive interpreter you will
3482 see another ``secret'' object that's present in every module:
3483 \code{__builtins__}.  This is either a dictionary or a module
3484 containing the set of built-in objects used by functions defined in
3485 current module.  Although normally all modules are initialized with a
3486 reference to the same dictionary, it is now possible to use a
3487 different set of built-ins on a per-module basis.  Together with the
3488 fact that the \code{import} statement uses the \code{__import__}
3489 function it finds in the importing modules' dictionary of built-ins,
3490 this forms the basis for a future restricted execution mode.
3491
3492
3493 \section{Python and the World-Wide Web}
3494
3495 There is a growing number of modules available for writing WWW tools.
3496 The previous release already sported modules \code{gopherlib},
3497 \code{ftplib}, \code{httplib} and \code{urllib} (which unifies the
3498 other three) for accessing data through the commonest WWW protocols.
3499 This release also provides \code{cgi}, to ease the writing of
3500 server-side scripts that use the Common Gateway Interface protocol,
3501 supported by most WWW servers.  The module \code{urlparse} provides
3502 precise parsing of a URL string into its components (address scheme,
3503 network location, path, parameters, query, and fragment identifier).
3504
3505 A rudimentary, parser for HTML files is available in the module
3506 \code{htmllib}.  It currently supports a subset of HTML 1.0 (if you
3507 bring it up to date, I'd love to receive your fixes!).  Unfortunately
3508 Python seems to be too slow for real-time parsing and formatting of
3509 HTML such as required by interactive WWW browsers --- but it's good
3510 enough to write a ``robot'' (an automated WWW browser that searches
3511 the web for information).
3512
3513
3514 \section{Miscellaneous}
3515
3516 \begin{itemize}
3517
3518 \item
3519 The \code{socket} module now exports all the needed constants used for
3520 socket operations, such as \code{SO_BROADCAST}.
3521
3522 \item
3523 The functions \code{popen()} and \code{fdopen()} in the \code{os}
3524 module now follow the pattern of the built-in function \code{open()}:
3525 the default mode argument is \code{'r'} and the optional third
3526 argument specifies the buffer size, where \code{0} means unbuffered,
3527 \code{1} means line-buffered, and any larger number means the size of
3528 the buffer in bytes.
3529
3530 \end{itemize}
3531
3532
3533 \chapter{New in Release 1.3}
3534
3535
3536 This chapter describes yet more recent additions to the Python
3537 language and library.
3538
3539
3540 \section{Keyword Arguments}
3541
3542 Functions and methods written in Python can now be called using
3543 keyword arguments of the form \code{\var{keyword} = \var{value}}.  For
3544 instance, the following function:
3545
3546 \begin{verbatim}
3547 def parrot(voltage, state='a stiff', action='voom', type='Norwegian Blue'):
3548     print "-- This parrot wouldn't", action,
3549     print "if you put", voltage, "Volts through it."
3550     print "-- Lovely plumage, the", type
3551     print "-- It's", state, "!"
3552 \end{verbatim}
3553
3554 could be called in any of the following ways:
3555
3556 \begin{verbatim}
3557 parrot(1000)
3558 parrot(action = 'VOOOOOM', voltage = 1000000)
3559 parrot('a thousand', state = 'pushing up the daisies')
3560 parrot('a million', 'bereft of life', 'jump')
3561 \end{verbatim}
3562
3563 but the following calls would all be invalid:
3564
3565 \begin{verbatim}
3566 parrot()                     # required argument missing
3567 parrot(voltage=5.0, 'dead')  # non-keyword argument following keyword
3568 parrot(110, voltage=220)     # duplicate value for argument
3569 parrot(actor='John Cleese')  # unknown keyword
3570 \end{verbatim}
3571
3572 In general, an argument list must have the form: zero or more
3573 positional arguments followed by zero or more keyword arguments, where
3574 the keywords must be chosen from the formal parameter names.  It's not
3575 important whether a formal parameter has a default value or not.  No
3576 argument must receive a value more than once -- formal parameter names
3577 corresponding to positional arguments cannot be used as keywords in
3578 the same calls.
3579
3580 Note that no special syntax is required to allow a function to be
3581 called with keyword arguments.  The additional costs incurred by
3582 keyword arguments are only present when a call uses them.
3583
3584 (As far as I know, these rules are exactly the same as used by
3585 Modula-3, even if they are enforced by totally different means.  This
3586 is intentional.)
3587
3588 When a final formal parameter of the form \code{**\var{name}} is
3589 present, it receives a dictionary containing all keyword arguments
3590 whose keyword doesn't correspond to a formal parameter.  This may be
3591 combined with a formal parameter of the form \code{*\var{name}} which
3592 receives a tuple containing the positional arguments beyond the formal
3593 parameter list.  (\code{*\var{name}} must occur before
3594 \code{**\var{name}}.)  For example, if we define a function like this:
3595
3596 \begin{verbatim}
3597 def cheeseshop(kind, *arguments, **keywords):
3598     print "-- Do you have any", kind, '?'
3599     print "-- I'm sorry, we're all out of", kind
3600     for arg in arguments: print arg
3601     print '-'*40
3602     for kw in keywords.keys(): print kw, ':', keywords[kw]
3603 \end{verbatim}
3604
3605 It could be called like this:
3606
3607 \begin{verbatim}
3608 cheeseshop('Limburger', "It's very runny, sir.",
3609            "It's really very, VERY runny, sir.",
3610            client='John Cleese',
3611            shopkeeper='Michael Palin',
3612            sketch='Cheese Shop Sketch')
3613 \end{verbatim}
3614
3615 and of course it would print:
3616
3617 \begin{verbatim}
3618 -- Do you have any Limburger ?
3619 -- I'm sorry, we're all out of Limburger
3620 It's very runny, sir.
3621 It's really very, VERY runny, sir.
3622 ----------------------------------------
3623 client : John Cleese
3624 shopkeeper : Michael Palin
3625 sketch : Cheese Shop Sketch
3626 \end{verbatim}
3627
3628 Consequences of this change include:
3629
3630 \begin{itemize}
3631
3632 \item
3633 The built-in function \code{apply()} now has an optional third
3634 argument, which is a dictionary specifying any keyword arguments to be
3635 passed.  For example,
3636 \begin{verbatim}
3637 apply(parrot, (), {'voltage': 20, 'action': 'voomm'})
3638 \end{verbatim}
3639 is equivalent to
3640 \begin{verbatim}
3641 parrot(voltage=20, action='voomm')
3642 \end{verbatim}
3643
3644 \item
3645 There is also a mechanism for functions and methods defined in an
3646 extension module (i.e., implemented in C or C++) to receive a
3647 dictionary of their keyword arguments.  By default, such functions do
3648 not accept keyword arguments, since the argument names are not
3649 available to the interpreter.
3650
3651 \item
3652 In the effort of implementing keyword arguments, function and
3653 especially method calls have been sped up significantly -- for a
3654 method with ten formal parameters, the call overhead has been cut in
3655 half; for a function with one formal parameters, the overhead has been
3656 reduced by a third.
3657
3658 \item
3659 The format of \code{.pyc} files has changed (again).
3660
3661 \item
3662 The \code{access} statement has been disabled.  The syntax is still
3663 recognized but no code is generated for it.  (There were some
3664 unpleasant interactions with changes for keyword arguments, and my
3665 plan is to get rid of \code{access} altogether in favor of a different
3666 approach.)
3667
3668 \end{itemize}
3669
3670 \section{Changes to the WWW and Internet tools}
3671
3672 \begin{itemize}
3673
3674 \item
3675 The \code{htmllib} module has been rewritten in an incompatible
3676 fashion.  The new version is considerably more complete (HTML 2.0
3677 except forms, but including all ISO-8859-1 entity definitions), and
3678 easy to use.  Small changes to \code{sgmllib} have also been made, to
3679 better match the tokenization of HTML as recognized by other web
3680 tools.
3681
3682 \item
3683 A new module \code{formatter} has been added, for use with the new
3684 \code{htmllib} module.
3685
3686 \item
3687 The \code{urllib}and \code{httplib} modules have been changed somewhat
3688 to allow overriding unknown URL types and to support authentication.
3689 They now use \code{mimetools.Message} instead of \code{rfc822.Message}
3690 to parse headers.  The \code{endrequest()} method has been removed
3691 from the HTTP class since it breaks the interaction with some servers.
3692
3693 \item
3694 The \code{rfc822.Message} class has been changed to allow a flag to be
3695 passed in that says that the file is unseekable.
3696
3697 \item
3698 The \code{ftplib} module has been fixed to be (hopefully) more robust
3699 on Linux.
3700
3701 \item
3702 Several new operations that are optionally supported by servers have
3703 been added to \code{nntplib}: \code{xover}, \code{xgtitle},
3704 \code{xpath} and \code{date}. % thanks to Kevan Heydon
3705
3706 \end{itemize}
3707
3708 \section{Other Language Changes}
3709
3710 \begin{itemize}
3711
3712 \item
3713 The \code{raise} statement now takes an optional argument which
3714 specifies the traceback to be used when printing the exception's stack
3715 trace.  This must be a traceback object, such as found in
3716 \code{sys.exc_traceback}.  When omitted or given as \code{None}, the
3717 old behavior (to generate a stack trace entry for the current stack
3718 frame) is used.
3719
3720 \item
3721 The tokenizer is now more tolerant of alien whitespace.  Control-L in
3722 the leading whitespace of a line resets the column number to zero,
3723 while Control-R just before the end of the line is ignored.
3724
3725 \end{itemize}
3726
3727 \section{Changes to Built-in Operations}
3728
3729 \begin{itemize}
3730
3731 \item
3732 For file objects, \code{\var{f}.read(0)} and
3733 \code{\var{f}.readline(0)} now return an empty string rather than
3734 reading an unlimited number of bytes.  For the latter, omit the
3735 argument altogether or pass a negative value.
3736
3737 \item
3738 A new system variable, \code{sys.platform}, has been added.  It
3739 specifies the current platform, e.g. \code{sunos5} or \code{linux1}.
3740
3741 \item
3742 The built-in functions \code{input()} and \code{raw_input()} now use
3743 the GNU readline library when it has been configured (formerly, only
3744 interactive input to the interpreter itself was read using GNU
3745 readline).  The GNU readline library provides elaborate line editing
3746 and history.  The Python debugger (\code{pdb}) is the first
3747 beneficiary of this change.
3748
3749 \item
3750 Two new built-in functions, \code{globals()} and \code{locals()},
3751 provide access to dictionaries containming current global and local
3752 variables, respectively.  (These augment rather than replace
3753 \code{vars()}, which returns the current local variables when called
3754 without an argument, and a module's global variables when called with
3755 an argument of type module.)
3756
3757 \item
3758 The built-in function \code{compile()} now takes a third possible
3759 value for the kind of code to be compiled: specifying \code{'single'}
3760 generates code for a single interactive statement, which prints the
3761 output of expression statements that evaluate to something else than
3762 \code{None}.
3763
3764 \end{itemize}
3765
3766 \section{Library Changes}
3767
3768 \begin{itemize}
3769
3770 \item
3771 There are new module \code{ni} and \code{ihooks} that support
3772 importing modules with hierarchical names such as \code{A.B.C}.  This
3773 is enabled by writing \code{import ni; ni.ni()} at the very top of the
3774 main program.  These modules are amply documented in the Python
3775 source.
3776
3777 \item
3778 The module \code{rexec} has been rewritten (incompatibly) to define a
3779 class and to use \code{ihooks}.
3780
3781 \item
3782 The \code{string.split()} and \code{string.splitfields()} functions
3783 are now the same function (the presence or absence of the second
3784 argument determines which operation is invoked); similar for
3785 \code{string.join()} and \code{string.joinfields()}.
3786
3787 \item
3788 The \code{Tkinter} module and its helper \code{Dialog} have been
3789 revamped to use keyword arguments.  Tk 4.0 is now the standard.  A new
3790 module \code{FileDialog} has been added which implements standard file
3791 selection dialogs.
3792
3793 \item
3794 The optional built-in modules \code{dbm} and \code{gdbm} are more
3795 coordinated --- their \code{open()} functions now take the same values
3796 for their \var{flag} argument, and the \var{flag} and \var{mode}
3797 argument have default values (to open the database for reading only,
3798 and to create the database with mode \code{0666} minuse the umask,
3799 respectively).  The memory leaks have finally been fixed.
3800
3801 \item
3802 A new dbm-like module, \code{bsddb}, has been added, which uses the
3803 BSD DB package's hash method. % thanks to David Ely
3804
3805 \item
3806 A portable (though slow) dbm-clone, implemented in Python, has been
3807 added for systems where none of the above is provided.  It is aptly
3808 dubbed \code{dumbdbm}.
3809
3810 \item
3811 The module \code{anydbm} provides a unified interface to \code{bsddb},
3812 \code{gdbm}, \code{dbm}, and \code{dumbdbm}, choosing the first one
3813 available.
3814
3815 \item
3816 A new extension module, \code{binascii}, provides a variety of
3817 operations for conversion of text-encoded binary data.
3818
3819 \item
3820 There are three new or rewritten companion modules implemented in
3821 Python that can encode and decode the most common such formats:
3822 \code{uu} (uuencode), \code{base64} and \code{binhex}.
3823
3824 \item
3825 A module to handle the MIME encoding quoted-printable has also been
3826 added: \code{quopri}.
3827
3828 \item
3829 The parser module (which provides an interface to the Python parser's
3830 abstract syntax trees) has been rewritten (incompatibly) by Fred
3831 Drake.  It now lets you change the parse tree and compile the result!
3832
3833 \item
3834 The \code{syslog} module has been upgraded and documented.
3835 % thanks to Steve Clift
3836
3837 \end{itemize}
3838
3839 \section{Other Changes}
3840
3841 \begin{itemize}
3842
3843 \item
3844 The dynamic module loader recognizes the fact that different filenames
3845 point to the same shared library and loads the library only once, so
3846 you can have a single shared library that defines multiple modules.
3847 (SunOS / SVR4 style shared libraries only.)
3848
3849 \item
3850 Jim Fulton's ``abstract object interface'' has been incorporated into
3851 the run-time API.  For more detailes, read the files
3852 \code{Include/abstract.h} and \code{Objects/abstract.c}.
3853
3854 \item
3855 The Macintosh version is much more robust now.
3856
3857 \item
3858 Numerous things I have forgotten or that are so obscure no-one will
3859 notice them anyway :-)
3860
3861 \end{itemize}
3862
3863 \end{document}