Doc/tut/tut.tex

   1 \documentclass{manual}
   2
   3 % Things to do:
   4 % Add a section on file I/O
   5 % Write a chapter entitled ``Some Useful Modules''
   6 %  --regex, math+cmath
   7 % Should really move the Python startup file info to an appendix
   8
   9 \title{Python Tutorial}
  10
  11 \input{boilerplate}
  12
  13 \begin{document}
  14
  15 \maketitle
  16
  17 \ifhtml
  18 \chapter*{Front Matter\label{front}}
  19 \fi
  20
  21 \input{copyright}
  22
  23 \begin{abstract}
  24
  25 \noindent
  26 Python is an easy to learn, powerful programming language.  It has
  27 efficient high-level data structures and a simple but effective
  28 approach to object-oriented programming.  Python's elegant syntax and
  29 dynamic typing, together with its interpreted nature, make it an ideal
  30 language for scripting and rapid application development in many areas
  31 on most platforms.
  32
  33 The Python interpreter and the extensive standard library are freely
  34 available in source or binary form for all major platforms from the
  35 Python web site, \url{http://www.python.org}, and can be freely
  36 distributed.  The same site also contains distributions of and
  37 pointers to many free third party Python modules, programs and tools,
  38 and additional documentation.
  39
  40 The Python interpreter is easily extended with new functions and data
  41 types implemented in C or \Cpp{} (or other languages callable from C).
  42 Python is also suitable as an extension language for customizable
  43 applications.
  44
  45 This tutorial introduces the reader informally to the basic concepts
  46 and features of the Python language and system.  It helps to have a
  47 Python interpreter handy for hands-on experience, but all examples are
  48 self-contained, so the tutorial can be read off-line as well.
  49
  50 For a description of standard objects and modules, see the
  51 \emph{Python Library Reference} document.  The \emph{Python Reference
  52 Manual} gives a more formal definition of the language.  To write
  53 extensions in C or \Cpp{}, read the \emph{Extending and Embedding} and
  54 \emph{Python/C API} manuals.  There are also several books covering
  55 Python in depth.
  56
  57 This tutorial does not attempt to be comprehensive and cover every
  58 single feature, or even every commonly used feature.  Instead, it
  59 introduces many of Python's most noteworthy features, and will give
  60 you a good idea of the language's flavor and style.  After reading it,
  61 you will be able to read and write Python modules and programs, and
  62 you will be ready to learn more about the various Python library
  63 modules described in the \emph{Python Library Reference}.
  64
  65 \end{abstract}
  66
  67 \tableofcontents
  68
  69
  70 \chapter{Whetting Your Appetite \label{intro}}
  71
  72 If you ever wrote a large shell script, you probably know this
  73 feeling: you'd love to add yet another feature, but it's already so
  74 slow, and so big, and so complicated; or the feature involves a system
  75 call or other function that is only accessible from C \ldots Usually
  76 the problem at hand isn't serious enough to warrant rewriting the
  77 script in C; perhaps the problem requires variable-length strings or
  78 other data types (like sorted lists of file names) that are easy in
  79 the shell but lots of work to implement in C, or perhaps you're not
  80 sufficiently familiar with C.
  81
  82 Another situation: perhaps you have to work with several C libraries,
  83 and the usual C write/compile/test/re-compile cycle is too slow.  You
  84 need to develop software more quickly.  Possibly perhaps you've
  85 written a program that could use an extension language, and you don't
  86 want to design a language, write and debug an interpreter for it, then
  87 tie it into your application.
  88
  89 In such cases, Python may be just the language for you.  Python is
  90 simple to use, but it is a real programming language, offering much
  91 more structure and support for large programs than the shell has.  On
  92 the other hand, it also offers much more error checking than C, and,
  93 being a \emph{very-high-level language}, it has high-level data types
  94 built in, such as flexible arrays and dictionaries that would cost you
  95 days to implement efficiently in C.  Because of its more general data
  96 types Python is applicable to a much larger problem domain than
  97 \emph{Awk} or even \emph{Perl}, yet many things are at least as easy
  98 in Python as in those languages.
  99
 100 Python allows you to split up your program in modules that can be
 101 reused in other Python programs.  It comes with a large collection of
 102 standard modules that you can use as the basis of your programs --- or
 103 as examples to start learning to program in Python.  There are also
 104 built-in modules that provide things like file I/O, system calls,
 105 sockets, and even interfaces to GUI toolkits like Tk.
 106
 107 Python is an interpreted language, which can save you considerable time
 108 during program development because no compilation and linking is
 109 necessary.  The interpreter can be used interactively, which makes it
 110 easy to experiment with features of the language, to write throw-away
 111 programs, or to test functions during bottom-up program development.
 112 It is also a handy desk calculator.
 113
 114 Python allows writing very compact and readable programs.  Programs
 115 written in Python are typically much shorter than equivalent C
 116 programs, for several reasons:
 117 \begin{itemize}
 118 \item
 119 the high-level data types allow you to express complex operations in a
 120 single statement;
 121 \item
 122 statement grouping is done by indentation instead of begin/end
 123 brackets;
 124 \item
 125 no variable or argument declarations are necessary.
 126 \end{itemize}
 127
 128 Python is \emph{extensible}: if you know how to program in C it is easy
 129 to add a new built-in function or module to the interpreter, either to
 130 perform critical operations at maximum speed, or to link Python
 131 programs to libraries that may only be available in binary form (such
 132 as a vendor-specific graphics library).  Once you are really hooked,
 133 you can link the Python interpreter into an application written in C
 134 and use it as an extension or command language for that application.
 135
 136 By the way, the language is named after the BBC show ``Monty Python's
 137 Flying Circus'' and has nothing to do with nasty reptiles.  Making
 138 references to Monty Python skits in documentation is not only allowed,
 139 it is encouraged!
 140
 141 \section{Where From Here \label{where}}
 142
 143 Now that you are all excited about Python, you'll want to examine it
 144 in some more detail.  Since the best way to learn a language is
 145 using it, you are invited here to do so.
 146
 147 In the next chapter, the mechanics of using the interpreter are
 148 explained.  This is rather mundane information, but essential for
 149 trying out the examples shown later.
 150
 151 The rest of the tutorial introduces various features of the Python
 152 language and system though examples, beginning with simple
 153 expressions, statements and data types, through functions and modules,
 154 and finally touching upon advanced concepts like exceptions
 155 and user-defined classes.
 156
 157 \chapter{Using the Python Interpreter \label{using}}
 158
 159 \section{Invoking the Interpreter \label{invoking}}
 160
 161 The Python interpreter is usually installed as \file{/usr/local/bin/python}
 162 on those machines where it is available; putting \file{/usr/local/bin} in
 163 your \UNIX{} shell's search path makes it possible to start it by
 164 typing the command
 165
 166 \begin{verbatim}
 167 python
 168 \end{verbatim}
 169
 170 to the shell.  Since the choice of the directory where the interpreter
 171 lives is an installation option, other places are possible; check with
 172 your local Python guru or system administrator.  (E.g.,
 173 \file{/usr/local/python} is a popular alternative location.)
 174
 175 Typing an EOF character (Control-D on \UNIX{}, Control-Z on DOS
 176 or Windows) at the primary prompt causes the interpreter to exit with
 177 a zero exit status.  If that doesn't work, you can exit the
 178 interpreter by typing the following commands: \samp{import sys;
 179 sys.exit()}.
 180
 181 The interpreter's line-editing features usually aren't very
 182 sophisticated.  On \UNIX{}, whoever installed the interpreter may have
 183 enabled support for the GNU readline library, which adds more
 184 elaborate interactive editing and history features. Perhaps the
 185 quickest check to see whether command line editing is supported is
 186 typing Control-P to the first Python prompt you get.  If it beeps, you
 187 have command line editing; see Appendix A for an introduction to the
 188 keys.  If nothing appears to happen, or if \code{\^P} is echoed,
 189 command line editing isn't available; you'll only be able to use
 190 backspace to remove characters from the current line.
 191
 192 The interpreter operates somewhat like the \UNIX{} shell: when called
 193 with standard input connected to a tty device, it reads and executes
 194 commands interactively; when called with a file name argument or with
 195 a file as standard input, it reads and executes a \emph{script} from
 196 that file.
 197
 198 A third way of starting the interpreter is
 199 \samp{python -c command [arg] ...}, which
 200 executes the statement(s) in \code{command}, analogous to the shell's
 201 \code{-c} option.  Since Python statements often contain spaces or other
 202 characters that are special to the shell, it is best to quote
 203 \code{command} in its entirety with double quotes.
 204
 205 Note that there is a difference between \samp{python file} and
 206 \samp{python <file}.  In the latter case, input requests from the
 207 program, such as calls to \code{input()} and \code{raw_input()}, are
 208 satisfied from \emph{file}.  Since this file has already been read
 209 until the end by the parser before the program starts executing, the
 210 program will encounter EOF immediately.  In the former case (which is
 211 usually what you want) they are satisfied from whatever file or device
 212 is connected to standard input of the Python interpreter.
 213
 214 When a script file is used, it is sometimes useful to be able to run
 215 the script and enter interactive mode afterwards.  This can be done by
 216 passing \code{-i} before the script.  (This does not work if the script
 217 is read from standard input, for the same reason as explained in the
 218 previous paragraph.)
 219
 220 \subsection{Argument Passing \label{argPassing}}
 221
 222 When known to the interpreter, the script name and additional
 223 arguments thereafter are passed to the script in the variable
 224 \code{sys.argv}, which is a list of strings.  Its length is at least
 225 one; when no script and no arguments are given, \code{sys.argv[0]} is
 226 an empty string.  When the script name is given as \code{'-'} (meaning
 227 standard input), \code{sys.argv[0]} is set to \code{'-'}.  When \code{-c
 228 command} is used, \code{sys.argv[0]} is set to \code{'-c'}.  Options
 229 found after \code{-c command} are not consumed by the Python
 230 interpreter's option processing but left in \code{sys.argv} for the
 231 command to handle.
 232
 233 \subsection{Interactive Mode \label{interactive}}
 234
 235 When commands are read from a tty, the interpreter is said to be in
 236 \emph{interactive mode}.  In this mode it prompts for the next command
 237 with the \emph{primary prompt}, usually three greater-than signs
 238 (\samp{>>> }); for continuation lines it prompts with the
 239 \emph{secondary prompt},
 240 by default three dots (\samp{... }).
 241
 242 The interpreter prints a welcome message stating its version number
 243 and a copyright notice before printing the first prompt, e.g.:
 244
 245 \begin{verbatim}
 246 python
 247 Python 1.5.2b2 (#1, Feb 28 1999, 00:02:06)  [GCC 2.8.1] on sunos5
 248 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
 249 >>>
 250 \end{verbatim}
 251
 252 \section{The Interpreter and Its Environment \label{interp}}
 253
 254 \subsection{Error Handling \label{error}}
 255
 256 When an error occurs, the interpreter prints an error
 257 message and a stack trace.  In interactive mode, it then returns to
 258 the primary prompt; when input came from a file, it exits with a
 259 nonzero exit status after printing
 260 the stack trace.  (Exceptions handled by an \code{except} clause in a
 261 \code{try} statement are not errors in this context.)  Some errors are
 262 unconditionally fatal and cause an exit with a nonzero exit; this
 263 applies to internal inconsistencies and some cases of running out of
 264 memory.  All error messages are written to the standard error stream;
 265 normal output from the executed commands is written to standard
 266 output.
 267
 268 Typing the interrupt character (usually Control-C or DEL) to the
 269 primary or secondary prompt cancels the input and returns to the
 270 primary prompt.\footnote{
 271         A problem with the GNU Readline package may prevent this.
 272 }
 273 Typing an interrupt while a command is executing raises the
 274 \code{KeyboardInterrupt} exception, which may be handled by a
 275 \code{try} statement.
 276
 277 \subsection{Executable Python Scripts \label{scripts}}
 278
 279 On BSD'ish \UNIX{} systems, Python scripts can be made directly
 280 executable, like shell scripts, by putting the line
 281
 282 \begin{verbatim}
 283 #! /usr/bin/env python
 284 \end{verbatim}
 285
 286 (assuming that the interpreter is on the user's \envvar{PATH}) at the
 287 beginning of the script and giving the file an executable mode.  The
 288 \samp{\#!} must be the first two characters of the file.
 289
 290 \subsection{The Interactive Startup File \label{startup}}
 291
 292 % XXX This should probably be dumped in an appendix, since most people
 293 % don't use Python interactively in non-trivial ways.
 294
 295 When you use Python interactively, it is frequently handy to have some
 296 standard commands executed every time the interpreter is started.  You
 297 can do this by setting an environment variable named
 298 \envvar{PYTHONSTARTUP} to the name of a file containing your start-up
 299 commands.  This is similar to the \file{.profile} feature of the \UNIX{}
 300 shells.
 301
 302 This file is only read in interactive sessions, not when Python reads
 303 commands from a script, and not when \file{/dev/tty} is given as the
 304 explicit source of commands (which otherwise behaves like an
 305 interactive session).  It is executed in the same name space where
 306 interactive commands are executed, so that objects that it defines or
 307 imports can be used without qualification in the interactive session.
 308 You can also change the prompts \code{sys.ps1} and \code{sys.ps2} in
 309 this file.
 310
 311 If you want to read an additional start-up file from the current
 312 directory, you can program this in the global start-up file,
 313 e.g.\ \samp{execfile('.pythonrc.py')}\indexii{.pythonrc.py}{file}.  If
 314 you want to use the startup file in a script, you must do this
 315 explicitly in the script:
 316
 317 \begin{verbatim}
 318 import os
 319 if os.environ.get('PYTHONSTARTUP') \
 320    and os.path.isfile(os.environ['PYTHONSTARTUP']):
 321     execfile(os.environ['PYTHONSTARTUP'])
 322 \end{verbatim}
 323
 324
 325 \chapter{An Informal Introduction to Python \label{informal}}
 326
 327 In the following examples, input and output are distinguished by the
 328 presence or absence of prompts (\samp{>>> } and \samp{... }): to repeat
 329 the example, you must type everything after the prompt, when the
 330 prompt appears; lines that do not begin with a prompt are output from
 331 the interpreter.%
 332 %\footnote{
 333 %        I'd prefer to use different fonts to distinguish input
 334 %        from output, but the amount of LaTeX hacking that would require
 335 %        is currently beyond my ability.
 336 %}
 337 Note that a secondary prompt on a line by itself in an example means
 338 you must type a blank line; this is used to end a multi-line command.
 339
 340 \section{Using Python as a Calculator \label{calculator}}
 341
 342 Let's try some simple Python commands.  Start the interpreter and wait
 343 for the primary prompt, \samp{>>> }.  (It shouldn't take long.)
 344
 345 \subsection{Numbers \label{numbers}}
 346
 347 The interpreter acts as a simple calculator: you can type an
 348 expression at it and it will write the value.  Expression syntax is
 349 straightforward: the operators \code{+}, \code{-}, \code{*} and \code{/}
 350 work just like in most other languages (e.g., Pascal or C); parentheses
 351 can be used for grouping.  For example:
 352
 353 \begin{verbatim}
 354 >>> 2+2
 355 4
 356 >>> # This is a comment
 357 ... 2+2
 358 4
 359 >>> 2+2  # and a comment on the same line as code
 360 4
 361 >>> (50-5*6)/4
 362 5
 363 >>> # Integer division returns the floor:
 364 ... 7/3
 365 2
 366 >>> 7/-3
 367 -3
 368 \end{verbatim}
 369
 370 Like in C, the equal sign (\character{=}) is used to assign a value to a
 371 variable.  The value of an assignment is not written:
 372
 373 \begin{verbatim}
 374 >>> width = 20
 375 >>> height = 5*9
 376 >>> width * height
 377 900
 378 \end{verbatim}
 379 %
 380 A value can be assigned to several variables simultaneously:
 381
 382 \begin{verbatim}
 383 >>> x = y = z = 0  # Zero x, y and z
 384 >>> x
 385 0
 386 >>> y
 387 0
 388 >>> z
 389 0
 390 \end{verbatim}
 391 %
 392 There is full support for floating point; operators with mixed type
 393 operands convert the integer operand to floating point:
 394
 395 \begin{verbatim}
 396 >>> 4 * 2.5 / 3.3
 397 3.0303030303
 398 >>> 7.0 / 2
 399 3.5
 400 \end{verbatim}
 401 %
 402 Complex numbers are also supported; imaginary numbers are written with
 403 a suffix of \samp{j} or \samp{J}.  Complex numbers with a nonzero
 404 real component are written as \samp{(\var{real}+\var{imag}j)}, or can
 405 be created with the \samp{complex(\var{real}, \var{imag})} function.
 406
 407 \begin{verbatim}
 408 >>> 1j * 1J
 409 (-1+0j)
 410 >>> 1j * complex(0,1)
 411 (-1+0j)
 412 >>> 3+1j*3
 413 (3+3j)
 414 >>> (3+1j)*3
 415 (9+3j)
 416 >>> (1+2j)/(1+1j)
 417 (1.5+0.5j)
 418 \end{verbatim}
 419 %
 420 Complex numbers are always represented as two floating point numbers,
 421 the real and imaginary part.  To extract these parts from a complex
 422 number \var{z}, use \code{\var{z}.real} and \code{\var{z}.imag}.
 423
 424 \begin{verbatim}
 425 >>> a=1.5+0.5j
 426 >>> a.real
 427 1.5
 428 >>> a.imag
 429 0.5
 430 \end{verbatim}
 431 %
 432 The conversion functions to floating point and integer
 433 (\function{float()}, \function{int()} and \function{long()}) don't
 434 work for complex numbers --- there is no one correct way to convert a
 435 complex number to a real number.  Use \code{abs(\var{z})} to get its
 436 magnitude (as a float) or \code{z.real} to get its real part.
 437
 438 \begin{verbatim}
 439 >>> a=1.5+0.5j
 440 >>> float(a)
 441 Traceback (innermost last):
 442   File "<stdin>", line 1, in ?
 443 TypeError: can't convert complex to float; use e.g. abs(z)
 444 >>> a.real
 445 1.5
 446 >>> abs(a)
 447 1.58113883008
 448 \end{verbatim}
 449 %
 450 In interactive mode, the last printed expression is assigned to the
 451 variable \code{_}.  This means that when you are using Python as a
 452 desk calculator, it is somewhat easier to continue calculations, for
 453 example:
 454
 455 \begin{verbatim}
 456 >>> tax = 17.5 / 100
 457 >>> price = 3.50
 458 >>> price * tax
 459 0.6125
 460 >>> price + _
 461 4.1125
 462 >>> round(_, 2)
 463 4.11
 464 \end{verbatim}
 465
 466 This variable should be treated as read-only by the user.  Don't
 467 explicitly assign a value to it --- you would create an independent
 468 local variable with the same name masking the built-in variable with
 469 its magic behavior.
 470
 471 \subsection{Strings \label{strings}}
 472
 473 Besides numbers, Python can also manipulate strings, which can be
 474 expressed in several ways.  They can be enclosed in single quotes or
 475 double quotes:
 476
 477 \begin{verbatim}
 478 >>> 'spam eggs'
 479 'spam eggs'
 480 >>> 'doesn\'t'
 481 "doesn't"
 482 >>> "doesn't"
 483 "doesn't"
 484 >>> '"Yes," he said.'
 485 '"Yes," he said.'
 486 >>> "\"Yes,\" he said."
 487 '"Yes," he said.'
 488 >>> '"Isn\'t," she said.'
 489 '"Isn\'t," she said.'
 490 \end{verbatim}
 491
 492 String literals can span multiple lines in several ways.  Newlines can
 493 be escaped with backslashes, e.g.:
 494
 495 \begin{verbatim}
 496 hello = "This is a rather long string containing\n\
 497 several lines of text just as you would do in C.\n\
 498     Note that whitespace at the beginning of the line is\
 499  significant.\n"
 500 print hello
 501 \end{verbatim}
 502
 503 which would print the following:
 504
 505 \begin{verbatim}
 506 This is a rather long string containing
 507 several lines of text just as you would do in C.
 508     Note that whitespace at the beginning of the line is significant.
 509 \end{verbatim}
 510
 511 Or, strings can be surrounded in a pair of matching triple-quotes:
 512 \code{"""} or \code {'''}.  End of lines do not need to be escaped
 513 when using triple-quotes, but they will be included in the string.
 514
 515 \begin{verbatim}
 516 print """
 517 Usage: thingy [OPTIONS]
 518      -h                        Display this usage message
 519      -H hostname               Hostname to connect to
 520 """
 521 \end{verbatim}
 522
 523 produces the following output:
 524
 525 \begin{verbatim}
 526 Usage: thingy [OPTIONS]
 527      -h                        Display this usage message
 528      -H hostname               Hostname to connect to
 529 \end{verbatim}
 530
 531 The interpreter prints the result of string operations in the same way
 532 as they are typed for input: inside quotes, and with quotes and other
 533 funny characters escaped by backslashes, to show the precise
 534 value.  The string is enclosed in double quotes if the string contains
 535 a single quote and no double quotes, else it's enclosed in single
 536 quotes.  (The \keyword{print} statement, described later, can be used
 537 to write strings without quotes or escapes.)
 538
 539 Strings can be concatenated (glued together) with the \code{+}
 540 operator, and repeated with \code{*}:
 541
 542 \begin{verbatim}
 543 >>> word = 'Help' + 'A'
 544 >>> word
 545 'HelpA'
 546 >>> '<' + word*5 + '>'
 547 '<HelpAHelpAHelpAHelpAHelpA>'
 548 \end{verbatim}
 549
 550 Two string literals next to each other are automatically concatenated;
 551 the first line above could also have been written \samp{word = 'Help'
 552 'A'}; this only works with two literals, not with arbitrary string
 553 expressions:
 554
 555 \begin{verbatim}
 556 >>> 'str' 'ing'                   #  <-  This is ok
 557 'string'
 558 >>> string.strip('str') + 'ing'   #  <-  This is ok
 559 'string'
 560 >>> string.strip('str') 'ing'     #  <-  This is invalid
 561   File "<stdin>", line 1
 562     string.strip('str') 'ing'
 563                             ^
 564 SyntaxError: invalid syntax
 565 \end{verbatim}
 566
 567 Strings can be subscripted (indexed); like in C, the first character
 568 of a string has subscript (index) 0.  There is no separate character
 569 type; a character is simply a string of size one.  Like in Icon,
 570 substrings can be specified with the \emph{slice notation}: two indices
 571 separated by a colon.
 572
 573 \begin{verbatim}
 574 >>> word[4]
 575 'A'
 576 >>> word[0:2]
 577 'He'
 578 >>> word[2:4]
 579 'lp'
 580 \end{verbatim}
 581
 582 Slice indices have useful defaults; an omitted first index defaults to
 583 zero, an omitted second index defaults to the size of the string being
 584 sliced.
 585
 586 \begin{verbatim}
 587 >>> word[:2]    # The first two characters
 588 'He'
 589 >>> word[2:]    # All but the first two characters
 590 'lpA'
 591 \end{verbatim}
 592
 593 Here's a useful invariant of slice operations: \code{s[:i] + s[i:]}
 594 equals \code{s}.
 595
 596 \begin{verbatim}
 597 >>> word[:2] + word[2:]
 598 'HelpA'
 599 >>> word[:3] + word[3:]
 600 'HelpA'
 601 \end{verbatim}
 602
 603 Degenerate slice indices are handled gracefully: an index that is too
 604 large is replaced by the string size, an upper bound smaller than the
 605 lower bound returns an empty string.
 606
 607 \begin{verbatim}
 608 >>> word[1:100]
 609 'elpA'
 610 >>> word[10:]
 611 ''
 612 >>> word[2:1]
 613 ''
 614 \end{verbatim}
 615
 616 Indices may be negative numbers, to start counting from the right.
 617 For example:
 618
 619 \begin{verbatim}
 620 >>> word[-1]     # The last character
 621 'A'
 622 >>> word[-2]     # The last-but-one character
 623 'p'
 624 >>> word[-2:]    # The last two characters
 625 'pA'
 626 >>> word[:-2]    # All but the last two characters
 627 'Hel'
 628 \end{verbatim}
 629
 630 But note that -0 is really the same as 0, so it does not count from
 631 the right!
 632
 633 \begin{verbatim}
 634 >>> word[-0]     # (since -0 equals 0)
 635 'H'
 636 \end{verbatim}
 637
 638 Out-of-range negative slice indices are truncated, but don't try this
 639 for single-element (non-slice) indices:
 640
 641 \begin{verbatim}
 642 >>> word[-100:]
 643 'HelpA'
 644 >>> word[-10]    # error
 645 Traceback (innermost last):
 646   File "<stdin>", line 1
 647 IndexError: string index out of range
 648 \end{verbatim}
 649
 650 The best way to remember how slices work is to think of the indices as
 651 pointing \emph{between} characters, with the left edge of the first
 652 character numbered 0.  Then the right edge of the last character of a
 653 string of \var{n} characters has index \var{n}, for example:
 654
 655 \begin{verbatim}
 656  +---+---+---+---+---+
 657  | H | e | l | p | A |
 658  +---+---+---+---+---+
 659  0   1   2   3   4   5
 660 -5  -4  -3  -2  -1
 661 \end{verbatim}
 662
 663 The first row of numbers gives the position of the indices 0...5 in
 664 the string; the second row gives the corresponding negative indices.
 665 The slice from \var{i} to \var{j} consists of all characters between
 666 the edges labeled \var{i} and \var{j}, respectively.
 667
 668 For nonnegative indices, the length of a slice is the difference of
 669 the indices, if both are within bounds, e.g., the length of
 670 \code{word[1:3]} is 2.
 671
 672 The built-in function \function{len()} returns the length of a string:
 673
 674 \begin{verbatim}
 675 >>> s = 'supercalifragilisticexpialidocious'
 676 >>> len(s)
 677 34
 678 \end{verbatim}
 679
 680 \subsection{Lists \label{lists}}
 681
 682 Python knows a number of \emph{compound} data types, used to group
 683 together other values.  The most versatile is the \emph{list}, which
 684 can be written as a list of comma-separated values (items) between
 685 square brackets.  List items need not all have the same type.
 686
 687 \begin{verbatim}
 688 >>> a = ['spam', 'eggs', 100, 1234]
 689 >>> a
 690 ['spam', 'eggs', 100, 1234]
 691 \end{verbatim}
 692
 693 Like string indices, list indices start at 0, and lists can be sliced,
 694 concatenated and so on:
 695
 696 \begin{verbatim}
 697 >>> a[0]
 698 'spam'
 699 >>> a[3]
 700 1234
 701 >>> a[-2]
 702 100
 703 >>> a[1:-1]
 704 ['eggs', 100]
 705 >>> a[:2] + ['bacon', 2*2]
 706 ['spam', 'eggs', 'bacon', 4]
 707 >>> 3*a[:3] + ['Boe!']
 708 ['spam', 'eggs', 100, 'spam', 'eggs', 100, 'spam', 'eggs', 100, 'Boe!']
 709 \end{verbatim}
 710
 711 Unlike strings, which are \emph{immutable}, it is possible to change
 712 individual elements of a list:
 713
 714 \begin{verbatim}
 715 >>> a
 716 ['spam', 'eggs', 100, 1234]
 717 >>> a[2] = a[2] + 23
 718 >>> a
 719 ['spam', 'eggs', 123, 1234]
 720 \end{verbatim}
 721
 722 Assignment to slices is also possible, and this can even change the size
 723 of the list:
 724
 725 \begin{verbatim}
 726 >>> # Replace some items:
 727 ... a[0:2] = [1, 12]
 728 >>> a
 729 [1, 12, 123, 1234]
 730 >>> # Remove some:
 731 ... a[0:2] = []
 732 >>> a
 733 [123, 1234]
 734 >>> # Insert some:
 735 ... a[1:1] = ['bletch', 'xyzzy']
 736 >>> a
 737 [123, 'bletch', 'xyzzy', 1234]
 738 >>> a[:0] = a     # Insert (a copy of) itself at the beginning
 739 >>> a
 740 [123, 'bletch', 'xyzzy', 1234, 123, 'bletch', 'xyzzy', 1234]
 741 \end{verbatim}
 742
 743 The built-in function \function{len()} also applies to lists:
 744
 745 \begin{verbatim}
 746 >>> len(a)
 747 8
 748 \end{verbatim}
 749
 750 It is possible to nest lists (create lists containing other lists),
 751 for example:
 752
 753 \begin{verbatim}
 754 >>> q = [2, 3]
 755 >>> p = [1, q, 4]
 756 >>> len(p)
 757 3
 758 >>> p[1]
 759 [2, 3]
 760 >>> p[1][0]
 761 2
 762 >>> p[1].append('xtra')     # See section 5.1
 763 >>> p
 764 [1, [2, 3, 'xtra'], 4]
 765 >>> q
 766 [2, 3, 'xtra']
 767 \end{verbatim}
 768
 769 Note that in the last example, \code{p[1]} and \code{q} really refer to
 770 the same object!  We'll come back to \emph{object semantics} later.
 771
 772 \section{First Steps Towards Programming \label{firstSteps}}
 773
 774 Of course, we can use Python for more complicated tasks than adding
 775 two and two together.  For instance, we can write an initial
 776 subsequence of the \emph{Fibonacci} series as follows:
 777
 778 \begin{verbatim}
 779 >>> # Fibonacci series:
 780 ... # the sum of two elements defines the next
 781 ... a, b = 0, 1
 782 >>> while b < 10:
 783 ...       print b
 784 ...       a, b = b, a+b
 785 ...
 786 1
 787 1
 788 2
 789 3
 790 5
 791 8
 792 \end{verbatim}
 793
 794 This example introduces several new features.
 795
 796 \begin{itemize}
 797
 798 \item
 799 The first line contains a \emph{multiple assignment}: the variables
 800 \code{a} and \code{b} simultaneously get the new values 0 and 1.  On the
 801 last line this is used again, demonstrating that the expressions on
 802 the right-hand side are all evaluated first before any of the
 803 assignments take place.
 804
 805 \item
 806 The \keyword{while} loop executes as long as the condition (here:
 807 \code{b < 10}) remains true.  In Python, like in C, any non-zero
 808 integer value is true; zero is false.  The condition may also be a
 809 string or list value, in fact any sequence; anything with a non-zero
 810 length is true, empty sequences are false.  The test used in the
 811 example is a simple comparison.  The standard comparison operators are
 812 written the same as in C: \code{<}, \code{>}, \code{==}, \code{<=},
 813 \code{>=} and \code{!=}.
 814
 815 \item
 816 The \emph{body} of the loop is \emph{indented}: indentation is Python's
 817 way of grouping statements.  Python does not (yet!) provide an
 818 intelligent input line editing facility, so you have to type a tab or
 819 space(s) for each indented line.  In practice you will prepare more
 820 complicated input for Python with a text editor; most text editors have
 821 an auto-indent facility.  When a compound statement is entered
 822 interactively, it must be followed by a blank line to indicate
 823 completion (since the parser cannot guess when you have typed the last
 824 line).
 825
 826 \item
 827 The \keyword{print} statement writes the value of the expression(s) it is
 828 given.  It differs from just writing the expression you want to write
 829 (as we did earlier in the calculator examples) in the way it handles
 830 multiple expressions and strings.  Strings are printed without quotes,
 831 and a space is inserted between items, so you can format things nicely,
 832 like this:
 833
 834 \begin{verbatim}
 835 >>> i = 256*256
 836 >>> print 'The value of i is', i
 837 The value of i is 65536
 838 \end{verbatim}
 839
 840 A trailing comma avoids the newline after the output:
 841
 842 \begin{verbatim}
 843 >>> a, b = 0, 1
 844 >>> while b < 1000:
 845 ...     print b,
 846 ...     a, b = b, a+b
 847 ...
 848 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
 849 \end{verbatim}
 850
 851 Note that the interpreter inserts a newline before it prints the next
 852 prompt if the last line was not completed.
 853
 854 \end{itemize}
 855
 856
 857 \chapter{More Control Flow Tools \label{moreControl}}
 858
 859 Besides the \keyword{while} statement just introduced, Python knows
 860 the usual control flow statements known from other languages, with
 861 some twists.
 862
 863 \section{\keyword{if} Statements \label{if}}
 864
 865 Perhaps the most well-known statement type is the \keyword{if}
 866 statement.  For example:
 867
 868 \begin{verbatim}
 869 >>> #  [Code which sets 'x' to a value...]
 870 >>> if x < 0:
 871 ...      x = 0
 872 ...      print 'Negative changed to zero'
 873 ... elif x == 0:
 874 ...      print 'Zero'
 875 ... elif x == 1:
 876 ...      print 'Single'
 877 ... else:
 878 ...      print 'More'
 879 ...
 880 \end{verbatim}
 881
 882 There can be zero or more \keyword{elif} parts, and the \keyword{else}
 883 part is optional.  The keyword `\keyword{elif}' is short for `else
 884 if', and is useful to avoid excessive indentation.  An
 885 \keyword{if} \ldots\ \keyword{elif} \ldots\ \keyword{elif}
 886 \ldots\ sequence is a substitute for the  \emph{switch} or
 887 %    ^^^^
 888 %    Weird spacings happen here if the wrapping of the source text
 889 %    gets changed in the wrong way.
 890 \emph{case} statements found in other languages.
 891
 892
 893 \section{\keyword{for} Statements \label{for}}
 894
 895 The \keyword{for}\stindex{for} statement in Python differs a bit from
 896 what you may be used to in C or Pascal.  Rather than always
 897 iterating over an arithmetic progression of numbers (like in Pascal),
 898 or giving the user the ability to define both the iteration step and
 899 halting condition (as C), Python's \keyword{for}\stindex{for}
 900 statement iterates over the items of any sequence (e.g., a list or a
 901 string), in the order that they appear in the sequence.  For example
 902 (no pun intended):
 903 % One suggestion was to give a real C example here, but that may only
 904 % serve to confuse non-C programmers.
 905
 906 \begin{verbatim}
 907 >>> # Measure some strings:
 908 ... a = ['cat', 'window', 'defenestrate']
 909 >>> for x in a:
 910 ...     print x, len(x)
 911 ...
 912 cat 3
 913 window 6
 914 defenestrate 12
 915 \end{verbatim}
 916
 917 It is not safe to modify the sequence being iterated over in the loop
 918 (this can only happen for mutable sequence types, i.e., lists).  If
 919 you need to modify the list you are iterating over, e.g., duplicate
 920 selected items, you must iterate over a copy.  The slice notation
 921 makes this particularly convenient:
 922
 923 \begin{verbatim}
 924 >>> for x in a[:]: # make a slice copy of the entire list
 925 ...    if len(x) > 6: a.insert(0, x)
 926 ...
 927 >>> a
 928 ['defenestrate', 'cat', 'window', 'defenestrate']
 929 \end{verbatim}
 930
 931
 932 \section{The \function{range()} Function \label{range}}
 933
 934 If you do need to iterate over a sequence of numbers, the built-in
 935 function \function{range()} comes in handy.  It generates lists
 936 containing arithmetic progressions, e.g.:
 937
 938 \begin{verbatim}
 939 >>> range(10)
 940 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 941 \end{verbatim}
 942
 943 The given end point is never part of the generated list;
 944 \code{range(10)} generates a list of 10 values, exactly the legal
 945 indices for items of a sequence of length 10.  It is possible to let
 946 the range start at another number, or to specify a different increment
 947 (even negative):
 948
 949 \begin{verbatim}
 950 >>> range(5, 10)
 951 [5, 6, 7, 8, 9]
 952 >>> range(0, 10, 3)
 953 [0, 3, 6, 9]
 954 >>> range(-10, -100, -30)
 955 [-10, -40, -70]
 956 \end{verbatim}
 957
 958 To iterate over the indices of a sequence, combine \function{range()}
 959 and \function{len()} as follows:
 960
 961 \begin{verbatim}
 962 >>> a = ['Mary', 'had', 'a', 'little', 'lamb']
 963 >>> for i in range(len(a)):
 964 ...     print i, a[i]
 965 ...
 966 0 Mary
 967 1 had
 968 2 a
 969 3 little
 970 4 lamb
 971 \end{verbatim}
 972
 973 \section{\keyword{break} and \keyword{continue} Statements, and
 974          \keyword{else} Clauses on Loops
 975          \label{break}}
 976
 977 The \keyword{break} statement, like in C, breaks out of the smallest
 978 enclosing \keyword{for} or \keyword{while} loop.
 979
 980 The \keyword{continue} statement, also borrowed from C, continues
 981 with the next iteration of the loop.
 982
 983 Loop statements may have an \code{else} clause; it is executed when
 984 the loop terminates through exhaustion of the list (with
 985 \keyword{for}) or when the condition becomes false (with
 986 \keyword{while}), but not when the loop is terminated by a
 987 \keyword{break} statement.  This is exemplified by the following loop,
 988 which searches for prime numbers:
 989
 990 \begin{verbatim}
 991 >>> for n in range(2, 10):
 992 ...     for x in range(2, n):
 993 ...         if n % x == 0:
 994 ...            print n, 'equals', x, '*', n/x
 995 ...            break
 996 ...     else:
 997 ...          print n, 'is a prime number'
 998 ...
 999 2 is a prime number
1000 3 is a prime number
1001 4 equals 2 * 2
1002 5 is a prime number
1003 6 equals 2 * 3
1004 7 is a prime number
1005 8 equals 2 * 4
1006 9 equals 3 * 3
1007 \end{verbatim}
1008
1009 \section{\keyword{pass} Statements \label{pass}}
1010
1011 The \keyword{pass} statement does nothing.
1012 It can be used when a statement is required syntactically but the
1013 program requires no action.
1014 For example:
1015
1016 \begin{verbatim}
1017 >>> while 1:
1018 ...       pass # Busy-wait for keyboard interrupt
1019 ...
1020 \end{verbatim}
1021
1022 \section{Defining Functions \label{functions}}
1023
1024 We can create a function that writes the Fibonacci series to an
1025 arbitrary boundary:
1026
1027 \begin{verbatim}
1028 >>> def fib(n):    # write Fibonacci series up to n
1029 ...     "Print a Fibonacci series up to n"
1030 ...     a, b = 0, 1
1031 ...     while b < n:
1032 ...         print b,
1033 ...         a, b = b, a+b
1034 ...
1035 >>> # Now call the function we just defined:
1036 ... fib(2000)
1037 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
1038 \end{verbatim}
1039
1040 The keyword \keyword{def} introduces a function \emph{definition}.  It
1041 must be followed by the function name and the parenthesized list of
1042 formal parameters.  The statements that form the body of the function
1043 start at the next line, indented by a tab stop.  The first statement
1044 of the function body can optionally be a string literal; this string
1045 literal is the function's documentation string, or \dfn{docstring}.
1046 There are tools which use docstrings to automatically produce printed
1047 documentation, or to let the user interactively browse through code;
1048 it's good practice to include docstrings in code that you write, so
1049 try to make a habit of it.
1050
1051 The \emph{execution} of a function introduces a new symbol table used
1052 for the local variables of the function.  More precisely, all variable
1053 assignments in a function store the value in the local symbol table;
1054 whereas variable references first look in the local symbol table, then
1055 in the global symbol table, and then in the table of built-in names.
1056 Thus,  global variables cannot be directly assigned a value within a
1057 function (unless named in a \keyword{global} statement), although
1058 they may be referenced.
1059
1060 The actual parameters (arguments) to a function call are introduced in
1061 the local symbol table of the called function when it is called; thus,
1062 arguments are passed using \emph{call by value}.\footnote{
1063          Actually, \emph{call by object reference} would be a better
1064          description, since if a mutable object is passed, the caller
1065          will see any changes the callee makes to it (e.g., items
1066          inserted into a list).
1067 }
1068 When a function calls another function, a new local symbol table is
1069 created for that call.
1070
1071 A function definition introduces the function name in the current
1072 symbol table.  The value of the function name
1073 has a type that is recognized by the interpreter as a user-defined
1074 function.  This value can be assigned to another name which can then
1075 also be used as a function.  This serves as a general renaming
1076 mechanism:
1077
1078 \begin{verbatim}
1079 >>> fib
1080 <function object at 10042ed0>
1081 >>> f = fib
1082 >>> f(100)
1083 1 1 2 3 5 8 13 21 34 55 89
1084 \end{verbatim}
1085
1086 You might object that \code{fib} is not a function but a procedure.  In
1087 Python, like in C, procedures are just functions that don't return a
1088 value.  In fact, technically speaking, procedures do return a value,
1089 albeit a rather boring one.  This value is called \code{None} (it's a
1090 built-in name).  Writing the value \code{None} is normally suppressed by
1091 the interpreter if it would be the only value written.  You can see it
1092 if you really want to:
1093
1094 \begin{verbatim}
1095 >>> print fib(0)
1096 None
1097 \end{verbatim}
1098
1099 It is simple to write a function that returns a list of the numbers of
1100 the Fibonacci series, instead of printing it:
1101
1102 \begin{verbatim}
1103 >>> def fib2(n): # return Fibonacci series up to n
1104 ...     "Return a list containing the Fibonacci series up to n"
1105 ...     result = []
1106 ...     a, b = 0, 1
1107 ...     while b < n:
1108 ...         result.append(b)    # see below
1109 ...         a, b = b, a+b
1110 ...     return result
1111 ...
1112 >>> f100 = fib2(100)    # call it
1113 >>> f100                # write the result
1114 [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
1115 \end{verbatim}
1116 %
1117 This example, as usual, demonstrates some new Python features:
1118
1119 \begin{itemize}
1120
1121 \item
1122 The \keyword{return} statement returns with a value from a function.
1123 \keyword{return} without an expression argument is used to return from
1124 the middle of a procedure (falling off the end also returns from a
1125 procedure), in which case the \code{None} value is returned.
1126
1127 \item
1128 The statement \code{result.append(b)} calls a \emph{method} of the list
1129 object \code{result}.  A method is a function that `belongs' to an
1130 object and is named \code{obj.methodname}, where \code{obj} is some
1131 object (this may be an expression), and \code{methodname} is the name
1132 of a method that is defined by the object's type.  Different types
1133 define different methods.  Methods of different types may have the
1134 same name without causing ambiguity.  (It is possible to define your
1135 own object types and methods, using \emph{classes}, as discussed later
1136 in this tutorial.)
1137 The method \method{append()} shown in the example, is defined for
1138 list objects; it adds a new element at the end of the list.  In this
1139 example it is equivalent to \samp{result = result + [b]}, but more
1140 efficient.
1141
1142 \end{itemize}
1143
1144 \section{More on Defining Functions \label{defining}}
1145
1146 It is also possible to define functions with a variable number of
1147 arguments.  There are three forms, which can be combined.
1148
1149 \subsection{Default Argument Values \label{defaultArgs}}
1150
1151 The most useful form is to specify a default value for one or more
1152 arguments.  This creates a function that can be called with fewer
1153 arguments than it is defined, e.g.
1154
1155 \begin{verbatim}
1156 def ask_ok(prompt, retries=4, complaint='Yes or no, please!'):
1157     while 1:
1158         ok = raw_input(prompt)
1159         if ok in ('y', 'ye', 'yes'): return 1
1160         if ok in ('n', 'no', 'nop', 'nope'): return 0
1161         retries = retries - 1
1162         if retries < 0: raise IOError, 'refusenik user'
1163         print complaint
1164 \end{verbatim}
1165
1166 This function can be called either like this:
1167 \code{ask_ok('Do you really want to quit?')} or like this:
1168 \code{ask_ok('OK to overwrite the file?', 2)}.
1169
1170 The default values are evaluated at the point of function definition
1171 in the \emph{defining} scope, so that e.g.
1172
1173 \begin{verbatim}
1174 i = 5
1175 def f(arg = i): print arg
1176 i = 6
1177 f()
1178 \end{verbatim}
1179
1180 will print \code{5}.
1181
1182 \strong{Important warning:}  The default value is evaluated only once.
1183 This makes a difference when the default is a mutable object such as a
1184 list or dictionary.  For example, the following function accumulates
1185 the arguments passed to it on subsequent calls:
1186
1187 \begin{verbatim}
1188 def f(a, l = []):
1189     l.append(a)
1190     return l
1191 print f(1)
1192 print f(2)
1193 print f(3)
1194 \end{verbatim}
1195
1196 This will print
1197
1198 \begin{verbatim}
1199 [1]
1200 [1, 2]
1201 [1, 2, 3]
1202 \end{verbatim}
1203
1204 If you don't want the default to be shared between subsequent calls,
1205 you can write the function like this instead:
1206
1207 \begin{verbatim}
1208 def f(a, l = None):
1209     if l is None:
1210         l = []
1211     l.append(a)
1212     return l
1213 \end{verbatim}
1214
1215 \subsection{Keyword Arguments \label{keywordArgs}}
1216
1217 Functions can also be called using
1218 keyword arguments of the form \samp{\var{keyword} = \var{value}}.  For
1219 instance, the following function:
1220
1221 \begin{verbatim}
1222 def parrot(voltage, state='a stiff', action='voom', type='Norwegian Blue'):
1223     print "-- This parrot wouldn't", action,
1224     print "if you put", voltage, "Volts through it."
1225     print "-- Lovely plumage, the", type
1226     print "-- It's", state, "!"
1227 \end{verbatim}
1228
1229 could be called in any of the following ways:
1230
1231 \begin{verbatim}
1232 parrot(1000)
1233 parrot(action = 'VOOOOOM', voltage = 1000000)
1234 parrot('a thousand', state = 'pushing up the daisies')
1235 parrot('a million', 'bereft of life', 'jump')
1236 \end{verbatim}
1237
1238 but the following calls would all be invalid:
1239
1240 \begin{verbatim}
1241 parrot()                     # required argument missing
1242 parrot(voltage=5.0, 'dead')  # non-keyword argument following keyword
1243 parrot(110, voltage=220)     # duplicate value for argument
1244 parrot(actor='John Cleese')  # unknown keyword
1245 \end{verbatim}
1246
1247 In general, an argument list must have any positional arguments
1248 followed by any keyword arguments, where the keywords must be chosen
1249 from the formal parameter names.  It's not important whether a formal
1250 parameter has a default value or not.  No argument must receive a
1251 value more than once --- formal parameter names corresponding to
1252 positional arguments cannot be used as keywords in the same calls.
1253
1254 When a final formal parameter of the form \code{**\var{name}} is
1255 present, it receives a dictionary containing all keyword arguments
1256 whose keyword doesn't correspond to a formal parameter.  This may be
1257 combined with a formal parameter of the form \code{*\var{name}}
1258 (described in the next subsection) which receives a tuple containing
1259 the positional arguments beyond the formal parameter list.
1260 (\code{*\var{name}} must occur before \code{**\var{name}}.)  For
1261 example, if we define a function like this:
1262
1263 \begin{verbatim}
1264 def cheeseshop(kind, *arguments, **keywords):
1265     print "-- Do you have any", kind, '?'
1266     print "-- I'm sorry, we're all out of", kind
1267     for arg in arguments: print arg
1268     print '-'*40
1269     for kw in keywords.keys(): print kw, ':', keywords[kw]
1270 \end{verbatim}
1271
1272 It could be called like this:
1273
1274 \begin{verbatim}
1275 cheeseshop('Limburger', "It's very runny, sir.",
1276            "It's really very, VERY runny, sir.",
1277            client='John Cleese',
1278            shopkeeper='Michael Palin',
1279            sketch='Cheese Shop Sketch')
1280 \end{verbatim}
1281
1282 and of course it would print:
1283
1284 \begin{verbatim}
1285 -- Do you have any Limburger ?
1286 -- I'm sorry, we're all out of Limburger
1287 It's very runny, sir.
1288 It's really very, VERY runny, sir.
1289 ----------------------------------------
1290 client : John Cleese
1291 shopkeeper : Michael Palin
1292 sketch : Cheese Shop Sketch
1293 \end{verbatim}
1294
1295 \subsection{Arbitrary Argument Lists \label{arbitraryArgs}}
1296
1297 Finally, the least frequently used option is to specify that a
1298 function can be called with an arbitrary number of arguments.  These
1299 arguments will be wrapped up in a tuple.  Before the variable number
1300 of arguments, zero or more normal arguments may occur.
1301
1302 \begin{verbatim}
1303 def fprintf(file, format, *args):
1304     file.write(format % args)
1305 \end{verbatim}
1306
1307
1308 \subsection{Lambda Forms \label{lambda}}
1309
1310 By popular demand, a few features commonly found in functional
1311 programming languages and Lisp have been added to Python.  With the
1312 \keyword{lambda} keyword, small anonymous functions can be created.
1313 Here's a function that returns the sum of its two arguments:
1314 \samp{lambda a, b: a+b}.  Lambda forms can be used wherever function
1315 objects are required.  They are syntactically restricted to a single
1316 expression.  Semantically, they are just syntactic sugar for a normal
1317 function definition.  Like nested function definitions, lambda forms
1318 cannot reference variables from the containing scope, but this can be
1319 overcome through the judicious use of default argument values, e.g.
1320
1321 \begin{verbatim}
1322 def make_incrementor(n):
1323     return lambda x, incr=n: x+incr
1324 \end{verbatim}
1325
1326 \subsection{Documentation Strings \label{docstrings}}
1327
1328 There are emerging conventions about the content and formatting of
1329 documentation strings.
1330
1331 The first line should always be a short, concise summary of the
1332 object's purpose.  For brevity, it should not explicitly state the
1333 object's name or type, since these are available by other means
1334 (except if the name happens to be a verb describing a function's
1335 operation).  This line should begin with a capital letter and end with
1336 a period.
1337
1338 If there are more lines in the documentation string, the second line
1339 should be blank, visually separating the summary from the rest of the
1340 description.  The following lines should be one or more paragraphs
1341 describing the object's calling conventions, its side effects, etc.
1342
1343 The Python parser does not strip indentation from multi-line string
1344 literals in Python, so tools that process documentation have to strip
1345 indentation.  This is done using the following convention.  The first
1346 non-blank line \emph{after} the first line of the string determines the
1347 amount of indentation for the entire documentation string.  (We can't
1348 use the first line since it is generally adjacent to the string's
1349 opening quotes so its indentation is not apparent in the string
1350 literal.)  Whitespace ``equivalent'' to this indentation is then
1351 stripped from the start of all lines of the string.  Lines that are
1352 indented less should not occur, but if they occur all their leading
1353 whitespace should be stripped.  Equivalence of whitespace should be
1354 tested after expansion of tabs (to 8 spaces, normally).
1355
1356
1357
1358 \chapter{Data Structures \label{structures}}
1359
1360 This chapter describes some things you've learned about already in
1361 more detail, and adds some new things as well.
1362
1363 \section{More on Lists \label{moreLists}}
1364
1365 The list data type has some more methods.  Here are all of the methods
1366 of list objects:
1367
1368 \begin{description}
1369
1370 \item[\code{insert(i, x)}]
1371 Insert an item at a given position.  The first argument is the index of
1372 the element before which to insert, so \code{a.insert(0, x)} inserts at
1373 the front of the list, and \code{a.insert(len(a), x)} is equivalent to
1374 \code{a.append(x)}.
1375
1376 \item[\code{append(x)}]
1377 Equivalent to \code{a.insert(len(a), x)}.
1378
1379 \item[\code{index(x)}]
1380 Return the index in the list of the first item whose value is \code{x}.
1381 It is an error if there is no such item.
1382
1383 \item[\code{remove(x)}]
1384 Remove the first item from the list whose value is \code{x}.
1385 It is an error if there is no such item.
1386
1387 \item[\code{sort()}]
1388 Sort the items of the list, in place.
1389
1390 \item[\code{reverse()}]
1391 Reverse the elements of the list, in place.
1392
1393 \item[\code{count(x)}]
1394 Return the number of times \code{x} appears in the list.
1395
1396 \end{description}
1397
1398 An example that uses all list methods:
1399
1400 \begin{verbatim}
1401 >>> a = [66.6, 333, 333, 1, 1234.5]
1402 >>> print a.count(333), a.count(66.6), a.count('x')
1403 2 1 0
1404 >>> a.insert(2, -1)
1405 >>> a.append(333)
1406 >>> a
1407 [66.6, 333, -1, 333, 1, 1234.5, 333]
1408 >>> a.index(333)
1409 1
1410 >>> a.remove(333)
1411 >>> a
1412 [66.6, -1, 333, 1, 1234.5, 333]
1413 >>> a.reverse()
1414 >>> a
1415 [333, 1234.5, 1, 333, -1, 66.6]
1416 >>> a.sort()
1417 >>> a
1418 [-1, 1, 66.6, 333, 333, 1234.5]
1419 \end{verbatim}
1420
1421 \subsection{Functional Programming Tools \label{functional}}
1422
1423 There are three built-in functions that are very useful when used with
1424 lists: \function{filter()}, \function{map()}, and \function{reduce()}.
1425
1426 \samp{filter(\var{function}, \var{sequence})} returns a sequence (of
1427 the same type, if possible) consisting of those items from the
1428 sequence for which \code{\var{function}(\var{item})} is true.  For
1429 example, to compute some primes:
1430
1431 \begin{verbatim}
1432 >>> def f(x): return x % 2 != 0 and x % 3 != 0
1433 ...
1434 >>> filter(f, range(2, 25))
1435 [5, 7, 11, 13, 17, 19, 23]
1436 \end{verbatim}
1437
1438 \samp{map(\var{function}, \var{sequence})} calls
1439 \code{\var{function}(\var{item})} for each of the sequence's items and
1440 returns a list of the return values.  For example, to compute some
1441 cubes:
1442
1443 \begin{verbatim}
1444 >>> def cube(x): return x*x*x
1445 ...
1446 >>> map(cube, range(1, 11))
1447 [1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
1448 \end{verbatim}
1449
1450 More than one sequence may be passed; the function must then have as
1451 many arguments as there are sequences and is called with the
1452 corresponding item from each sequence (or \code{None} if some sequence
1453 is shorter than another).  If \code{None} is passed for the function,
1454 a function returning its argument(s) is substituted.
1455
1456 Combining these two special cases, we see that
1457 \samp{map(None, \var{list1}, \var{list2})} is a convenient way of
1458 turning a pair of lists into a list of pairs.  For example:
1459
1460 \begin{verbatim}
1461 >>> seq = range(8)
1462 >>> def square(x): return x*x
1463 ...
1464 >>> map(None, seq, map(square, seq))
1465 [(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49)]
1466 \end{verbatim}
1467
1468 \samp{reduce(\var{func}, \var{sequence})} returns a single value
1469 constructed by calling the binary function \var{func} on the first two
1470 items of the sequence, then on the result and the next item, and so
1471 on.  For example, to compute the sum of the numbers 1 through 10:
1472
1473 \begin{verbatim}
1474 >>> def add(x,y): return x+y
1475 ...
1476 >>> reduce(add, range(1, 11))
1477 55
1478 \end{verbatim}
1479
1480 If there's only one item in the sequence, its value is returned; if
1481 the sequence is empty, an exception is raised.
1482
1483 A third argument can be passed to indicate the starting value.  In this
1484 case the starting value is returned for an empty sequence, and the
1485 function is first applied to the starting value and the first sequence
1486 item, then to the result and the next item, and so on.  For example,
1487
1488 \begin{verbatim}
1489 >>> def sum(seq):
1490 ...     def add(x,y): return x+y
1491 ...     return reduce(add, seq, 0)
1492 ...
1493 >>> sum(range(1, 11))
1494 55
1495 >>> sum([])
1496 0
1497 \end{verbatim}
1498
1499 \section{The \keyword{del} statement \label{del}}
1500
1501 There is a way to remove an item from a list given its index instead
1502 of its value: the \code{del} statement.  This can also be used to
1503 remove slices from a list (which we did earlier by assignment of an
1504 empty list to the slice).  For example:
1505
1506 \begin{verbatim}
1507 >>> a
1508 [-1, 1, 66.6, 333, 333, 1234.5]
1509 >>> del a[0]
1510 >>> a
1511 [1, 66.6, 333, 333, 1234.5]
1512 >>> del a[2:4]
1513 >>> a
1514 [1, 66.6, 1234.5]
1515 \end{verbatim}
1516
1517 \keyword{del} can also be used to delete entire variables:
1518
1519 \begin{verbatim}
1520 >>> del a
1521 \end{verbatim}
1522
1523 Referencing the name \code{a} hereafter is an error (at least until
1524 another value is assigned to it).  We'll find other uses for
1525 \keyword{del} later.
1526
1527 \section{Tuples and Sequences \label{tuples}}
1528
1529 We saw that lists and strings have many common properties, e.g.,
1530 indexing and slicing operations.  They are two examples of
1531 \emph{sequence} data types.  Since Python is an evolving language,
1532 other sequence data types may be added.  There is also another
1533 standard sequence data type: the \emph{tuple}.
1534
1535 A tuple consists of a number of values separated by commas, for
1536 instance:
1537
1538 \begin{verbatim}
1539 >>> t = 12345, 54321, 'hello!'
1540 >>> t[0]
1541 12345
1542 >>> t
1543 (12345, 54321, 'hello!')
1544 >>> # Tuples may be nested:
1545 ... u = t, (1, 2, 3, 4, 5)
1546 >>> u
1547 ((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
1548 \end{verbatim}
1549
1550 As you see, on output tuples are alway enclosed in parentheses, so
1551 that nested tuples are interpreted correctly; they may be input with
1552 or without surrounding parentheses, although often parentheses are
1553 necessary anyway (if the tuple is part of a larger expression).
1554
1555 Tuples have many uses, e.g., (x, y) coordinate pairs, employee records
1556 from a database, etc.  Tuples, like strings, are immutable: it is not
1557 possible to assign to the individual items of a tuple (you can
1558 simulate much of the same effect with slicing and concatenation,
1559 though).
1560
1561 A special problem is the construction of tuples containing 0 or 1
1562 items: the syntax has some extra quirks to accommodate these.  Empty
1563 tuples are constructed by an empty pair of parentheses; a tuple with
1564 one item is constructed by following a value with a comma
1565 (it is not sufficient to enclose a single value in parentheses).
1566 Ugly, but effective.  For example:
1567
1568 \begin{verbatim}
1569 >>> empty = ()
1570 >>> singleton = 'hello',    # <-- note trailing comma
1571 >>> len(empty)
1572 0
1573 >>> len(singleton)
1574 1
1575 >>> singleton
1576 ('hello',)
1577 \end{verbatim}
1578
1579 The statement \code{t = 12345, 54321, 'hello!'} is an example of
1580 \emph{tuple packing}: the values \code{12345}, \code{54321} and
1581 \code{'hello!'} are packed together in a tuple.  The reverse operation
1582 is also possible, e.g.:
1583
1584 \begin{verbatim}
1585 >>> x, y, z = t
1586 \end{verbatim}
1587
1588 This is called, appropriately enough, \emph{tuple unpacking}.  Tuple
1589 unpacking requires that the list of variables on the left has the same
1590 number of elements as the length of the tuple.  Note that multiple
1591 assignment is really just a combination of tuple packing and tuple
1592 unpacking!
1593
1594 % XXX This is no longer necessary!
1595 Occasionally, the corresponding operation on lists is useful: \emph{list
1596 unpacking}.  This is supported by enclosing the list of variables in
1597 square brackets:
1598
1599 \begin{verbatim}
1600 >>> a = ['spam', 'eggs', 100, 1234]
1601 >>> [a1, a2, a3, a4] = a
1602 \end{verbatim}
1603
1604 % XXX Add a bit on the difference between tuples and lists.
1605 % XXX Also explain that a tuple can *contain* a mutable object!
1606
1607 \section{Dictionaries \label{dictionaries}}
1608
1609 Another useful data type built into Python is the \emph{dictionary}.
1610 Dictionaries are sometimes found in other languages as ``associative
1611 memories'' or ``associative arrays''.  Unlike sequences, which are
1612 indexed by a range of numbers, dictionaries are indexed by \emph{keys},
1613 which can be any non-mutable type; strings and numbers can always be
1614 keys.  Tuples can be used as keys if they contain only strings,
1615 numbers, or tuples.  You can't use lists as keys, since lists can be
1616 modified in place using their \code{append()} method.
1617
1618 It is best to think of a dictionary as an unordered set of
1619 \emph{key:value} pairs, with the requirement that the keys are unique
1620 (within one dictionary).
1621 A pair of braces creates an empty dictionary: \code{\{\}}.
1622 Placing a comma-separated list of key:value pairs within the
1623 braces adds initial key:value pairs to the dictionary; this is also the
1624 way dictionaries are written on output.
1625
1626 The main operations on a dictionary are storing a value with some key
1627 and extracting the value given the key.  It is also possible to delete
1628 a key:value pair
1629 with \code{del}.
1630 If you store using a key that is already in use, the old value
1631 associated with that key is forgotten.  It is an error to extract a
1632 value using a non-existent key.
1633
1634 The \code{keys()} method of a dictionary object returns a list of all the
1635 keys used in the dictionary, in random order (if you want it sorted,
1636 just apply the \code{sort()} method to the list of keys).  To check
1637 whether a single key is in the dictionary, use the \code{has_key()}
1638 method of the dictionary.
1639
1640 Here is a small example using a dictionary:
1641
1642 \begin{verbatim}
1643 >>> tel = {'jack': 4098, 'sape': 4139}
1644 >>> tel['guido'] = 4127
1645 >>> tel
1646 {'sape': 4139, 'guido': 4127, 'jack': 4098}
1647 >>> tel['jack']
1648 4098
1649 >>> del tel['sape']
1650 >>> tel['irv'] = 4127
1651 >>> tel
1652 {'guido': 4127, 'irv': 4127, 'jack': 4098}
1653 >>> tel.keys()
1654 ['guido', 'irv', 'jack']
1655 >>> tel.has_key('guido')
1656 1
1657 \end{verbatim}
1658
1659 \section{More on Conditions \label{conditions}}
1660
1661 The conditions used in \code{while} and \code{if} statements above can
1662 contain other operators besides comparisons.
1663
1664 The comparison operators \code{in} and \code{not in} check whether a value
1665 occurs (does not occur) in a sequence.  The operators \code{is} and
1666 \code{is not} compare whether two objects are really the same object; this
1667 only matters for mutable objects like lists.  All comparison operators
1668 have the same priority, which is lower than that of all numerical
1669 operators.
1670
1671 Comparisons can be chained: e.g., \code{a < b == c} tests whether \code{a}
1672 is less than \code{b} and moreover \code{b} equals \code{c}.
1673
1674 Comparisons may be combined by the Boolean operators \code{and} and
1675 \code{or}, and the outcome of a comparison (or of any other Boolean
1676 expression) may be negated with \code{not}.  These all have lower
1677 priorities than comparison operators again; between them, \code{not} has
1678 the highest priority, and \code{or} the lowest, so that
1679 \code{A and not B or C} is equivalent to \code{(A and (not B)) or C}.  Of
1680 course, parentheses can be used to express the desired composition.
1681
1682 The Boolean operators \code{and} and \code{or} are so-called
1683 \emph{shortcut} operators: their arguments are evaluated from left to
1684 right, and evaluation stops as soon as the outcome is determined.
1685 E.g., if \code{A} and \code{C} are true but \code{B} is false, \code{A
1686 and B and C} does not evaluate the expression C.  In general, the
1687 return value of a shortcut operator, when used as a general value and
1688 not as a Boolean, is the last evaluated argument.
1689
1690 It is possible to assign the result of a comparison or other Boolean
1691 expression to a variable.  For example,
1692
1693 \begin{verbatim}
1694 >>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
1695 >>> non_null = string1 or string2 or string3
1696 >>> non_null
1697 'Trondheim'
1698 \end{verbatim}
1699
1700 Note that in Python, unlike C, assignment cannot occur inside expressions.
1701
1702 \section{Comparing Sequences and Other Types \label{comparing}}
1703
1704 Sequence objects may be compared to other objects with the same
1705 sequence type.  The comparison uses \emph{lexicographical} ordering:
1706 first the first two items are compared, and if they differ this
1707 determines the outcome of the comparison; if they are equal, the next
1708 two items are compared, and so on, until either sequence is exhausted.
1709 If two items to be compared are themselves sequences of the same type,
1710 the lexicographical comparison is carried out recursively.  If all
1711 items of two sequences compare equal, the sequences are considered
1712 equal.  If one sequence is an initial subsequence of the other, the
1713 shorted sequence is the smaller one.  Lexicographical ordering for
1714 strings uses the \ASCII{} ordering for individual characters.  Some
1715 examples of comparisons between sequences with the same types:
1716
1717 \begin{verbatim}
1718 (1, 2, 3)              < (1, 2, 4)
1719 [1, 2, 3]              < [1, 2, 4]
1720 'ABC' < 'C' < 'Pascal' < 'Python'
1721 (1, 2, 3, 4)           < (1, 2, 4)
1722 (1, 2)                 < (1, 2, -1)
1723 (1, 2, 3)              = (1.0, 2.0, 3.0)
1724 (1, 2, ('aa', 'ab'))   < (1, 2, ('abc', 'a'), 4)
1725 \end{verbatim}
1726
1727 Note that comparing objects of different types is legal.  The outcome
1728 is deterministic but arbitrary: the types are ordered by their name.
1729 Thus, a list is always smaller than a string, a string is always
1730 smaller than a tuple, etc.  Mixed numeric types are compared according
1731 to their numeric value, so 0 equals 0.0, etc.\footnote{
1732         The rules for comparing objects of different types should
1733         not be relied upon; they may change in a future version of
1734         the language.
1735 }
1736
1737
1738 \chapter{Modules \label{modules}}
1739
1740 If you quit from the Python interpreter and enter it again, the
1741 definitions you have made (functions and variables) are lost.
1742 Therefore, if you want to write a somewhat longer program, you are
1743 better off using a text editor to prepare the input for the interpreter
1744 and running it with that file as input instead.  This is known as creating a
1745 \emph{script}.  As your program gets longer, you may want to split it
1746 into several files for easier maintenance.  You may also want to use a
1747 handy function that you've written in several programs without copying
1748 its definition into each program.
1749
1750 To support this, Python has a way to put definitions in a file and use
1751 them in a script or in an interactive instance of the interpreter.
1752 Such a file is called a \emph{module}; definitions from a module can be
1753 \emph{imported} into other modules or into the \emph{main} module (the
1754 collection of variables that you have access to in a script
1755 executed at the top level
1756 and in calculator mode).
1757
1758 A module is a file containing Python definitions and statements.  The
1759 file name is the module name with the suffix \file{.py} appended.  Within
1760 a module, the module's name (as a string) is available as the value of
1761 the global variable \code{__name__}.  For instance, use your favorite text
1762 editor to create a file called \file{fibo.py} in the current directory
1763 with the following contents:
1764
1765 \begin{verbatim}
1766 # Fibonacci numbers module
1767
1768 def fib(n):    # write Fibonacci series up to n
1769     a, b = 0, 1
1770     while b < n:
1771         print b,
1772         a, b = b, a+b
1773
1774 def fib2(n): # return Fibonacci series up to n
1775     result = []
1776     a, b = 0, 1
1777     while b < n:
1778         result.append(b)
1779         a, b = b, a+b
1780     return result
1781 \end{verbatim}
1782
1783 Now enter the Python interpreter and import this module with the
1784 following command:
1785
1786 \begin{verbatim}
1787 >>> import fibo
1788 \end{verbatim}
1789
1790 This does not enter the names of the functions defined in
1791 \code{fibo}
1792 directly in the current symbol table; it only enters the module name
1793 \code{fibo}
1794 there.
1795 Using the module name you can access the functions:
1796
1797 \begin{verbatim}
1798 >>> fibo.fib(1000)
1799 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
1800 >>> fibo.fib2(100)
1801 [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
1802 >>> fibo.__name__
1803 'fibo'
1804 \end{verbatim}
1805 %
1806 If you intend to use a function often you can assign it to a local name:
1807
1808 \begin{verbatim}
1809 >>> fib = fibo.fib
1810 >>> fib(500)
1811 1 1 2 3 5 8 13 21 34 55 89 144 233 377
1812 \end{verbatim}
1813
1814
1815 \section{More on Modules \label{moreModules}}
1816
1817 A module can contain executable statements as well as function
1818 definitions.
1819 These statements are intended to initialize the module.
1820 They are executed only the
1821 \emph{first}
1822 time the module is imported somewhere.\footnote{
1823         In fact function definitions are also `statements' that are
1824         `executed'; the execution enters the function name in the
1825         module's global symbol table.
1826 }
1827
1828 Each module has its own private symbol table, which is used as the
1829 global symbol table by all functions defined in the module.
1830 Thus, the author of a module can use global variables in the module
1831 without worrying about accidental clashes with a user's global
1832 variables.
1833 On the other hand, if you know what you are doing you can touch a
1834 module's global variables with the same notation used to refer to its
1835 functions,
1836 \code{modname.itemname}.
1837
1838 Modules can import other modules.
1839 It is customary but not required to place all
1840 \code{import}
1841 statements at the beginning of a module (or script, for that matter).
1842 The imported module names are placed in the importing module's global
1843 symbol table.
1844
1845 There is a variant of the
1846 \code{import}
1847 statement that imports names from a module directly into the importing
1848 module's symbol table.
1849 For example:
1850
1851 \begin{verbatim}
1852 >>> from fibo import fib, fib2
1853 >>> fib(500)
1854 1 1 2 3 5 8 13 21 34 55 89 144 233 377
1855 \end{verbatim}
1856
1857 This does not introduce the module name from which the imports are taken
1858 in the local symbol table (so in the example, \code{fibo} is not
1859 defined).
1860
1861 There is even a variant to import all names that a module defines:
1862
1863 \begin{verbatim}
1864 >>> from fibo import *
1865 >>> fib(500)
1866 1 1 2 3 5 8 13 21 34 55 89 144 233 377
1867 \end{verbatim}
1868
1869 This imports all names except those beginning with an underscore
1870 (\code{_}).
1871
1872 \subsection{The Module Search Path \label{searchPath}}
1873
1874 % XXX Need to document that a lone .pyc/.pyo is acceptable too!
1875
1876 \indexiii{module}{search}{path}
1877 When a module named \module{spam} is imported, the interpreter searches
1878 for a file named \file{spam.py} in the current directory,
1879 and then in the list of directories specified by
1880 the environment variable \envvar{PYTHONPATH}.  This has the same syntax as
1881 the shell variable \envvar{PATH}, i.e., a list of
1882 directory names.  When \envvar{PYTHONPATH} is not set, or when the file
1883 is not found there, the search continues in an installation-dependent
1884 default path; on \UNIX{}, this is usually \file{.:/usr/local/lib/python}.
1885
1886 Actually, modules are searched in the list of directories given by the
1887 variable \code{sys.path} which is initialized from the directory
1888 containing the input script (or the current directory),
1889 \envvar{PYTHONPATH} and the installation-dependent default.  This allows
1890 Python programs that know what they're doing to modify or replace the
1891 module search path.  See the section on Standard Modules later.
1892
1893 \subsection{``Compiled'' Python files}
1894
1895 As an important speed-up of the start-up time for short programs that
1896 use a lot of standard modules, if a file called \file{spam.pyc} exists
1897 in the directory where \file{spam.py} is found, this is assumed to
1898 contain an already-``byte-compiled'' version of the module \module{spam}.
1899 The modification time of the version of \file{spam.py} used to create
1900 \file{spam.pyc} is recorded in \file{spam.pyc}, and the file is
1901 ignored if these don't match.
1902
1903 Normally, you don't need to do anything to create the \file{spam.pyc} file.
1904 Whenever \file{spam.py} is successfully compiled, an attempt is made to
1905 write the compiled version to \file{spam.pyc}.  It is not an error if
1906 this attempt fails; if for any reason the file is not written
1907 completely, the resulting \file{spam.pyc} file will be recognized as
1908 invalid and thus ignored later.  The contents of the \file{spam.pyc}
1909 file is platform independent, so a Python module directory can be
1910 shared by machines of different architectures.
1911
1912 Some tips for experts:
1913
1914 \begin{itemize}
1915
1916 \item
1917 When the Python interpreter is invoked with the \code{-O} flag,
1918 optimized code is generated and stored in \file{.pyo} files.
1919 The optimizer currently doesn't help much; it only removes
1920 \keyword{assert} statements and \code{SET_LINENO} instructions.
1921 When \code{-O} is used, \emph{all} bytecode is optimized; \code{.pyc}
1922 files are ignored and \code{.py} files are compiled to optimized
1923 bytecode.
1924
1925 \item
1926 Passing two \code{-O} flags to the Python interpreter (\code{-OO})
1927 will cause the bytecode compiler to perform optimizations that could
1928 in some rare cases result in malfunctioning programs.  Currently only
1929 \code{__doc__} strings are removed from the bytecode, resulting in more
1930 compact \file{.pyo} files.  Since some programs may rely on having
1931 these available, you should only use this option if you know what
1932 you're doing.
1933
1934 \item
1935 A program doesn't run any faster when it is read from a
1936 \file{.pyc} or \file{.pyo} file than when it is read from a \file{.py}
1937 file; the only thing that's faster about \file{.pyc} or \file{.pyo}
1938 files is the speed with which they are loaded.
1939
1940 \item
1941 When a script is run by giving its name on the command line, the
1942 bytecode for the script is never written to a \file{.pyc} or
1943 \file{.pyo} file.  Thus, the startup time of a script may be reduced
1944 by moving most of its code to a module and having a small bootstrap
1945 script that imports that module.
1946
1947 \item
1948 It is possible to have a file called \file{spam.pyc} (or
1949 \file{spam.pyo} when \code{-O} is used) without a module
1950 \file{spam.py} in the same module.  This can be used to distribute
1951 a library of Python code in a form that is moderately hard to reverse
1952 engineer.
1953
1954 \item
1955 The module \module{compileall}\refstmodindex{compileall} can create
1956 \file{.pyc} files (or \file{.pyo} files when \code{-O} is used) for
1957 all modules in a directory.
1958
1959 \end{itemize}
1960
1961
1962 \section{Standard Modules \label{standardModules}}
1963
1964 Python comes with a library of standard modules, described in a separate
1965 document, the \emph{Python Library Reference} (``Library Reference''
1966 hereafter).  Some modules are built into the interpreter; these
1967 provide access to operations that are not part of the core of the
1968 language but are nevertheless built in, either for efficiency or to
1969 provide access to operating system primitives such as system calls.
1970 The set of such modules is a configuration option; e.g., the
1971 \module{amoeba} module is  only provided on systems that somehow
1972 support Amoeba primitives.  One particular module deserves some
1973 attention: \module{sys}\refstmodindex{sys}, which is built into every
1974 Python interpreter.  The variables \code{sys.ps1} and
1975 \code{sys.ps2} define the strings used as primary and secondary
1976 prompts:
1977
1978 \begin{verbatim}
1979 >>> import sys
1980 >>> sys.ps1
1981 '>>> '
1982 >>> sys.ps2
1983 '... '
1984 >>> sys.ps1 = 'C> '
1985 C> print 'Yuck!'
1986 Yuck!
1987 C>
1988 \end{verbatim}
1989
1990 These two variables are only defined if the interpreter is in
1991 interactive mode.
1992
1993 The variable \code{sys.path} is a list of strings that determine the
1994 interpreter's search path for modules. It is initialized to a default
1995 path taken from the environment variable \envvar{PYTHONPATH}, or from
1996 a built-in default if \envvar{PYTHONPATH} is not set.  You can modify
1997 it using standard list operations, e.g.:
1998
1999 \begin{verbatim}
2000 >>> import sys
2001 >>> sys.path.append('/ufs/guido/lib/python')
2002 \end{verbatim}
2003
2004 \section{The \function{dir()} Function \label{dir}}
2005
2006 The built-in function \function{dir()} is used to find out which names
2007 a module defines.  It returns a sorted list of strings:
2008
2009 \begin{verbatim}
2010 >>> import fibo, sys
2011 >>> dir(fibo)
2012 ['__name__', 'fib', 'fib2']
2013 >>> dir(sys)
2014 ['__name__', 'argv', 'builtin_module_names', 'copyright', 'exit',
2015 'maxint', 'modules', 'path', 'ps1', 'ps2', 'setprofile', 'settrace',
2016 'stderr', 'stdin', 'stdout', 'version']
2017 \end{verbatim}
2018
2019 Without arguments, \function{dir()} lists the names you have defined
2020 currently:
2021
2022 \begin{verbatim}
2023 >>> a = [1, 2, 3, 4, 5]
2024 >>> import fibo, sys
2025 >>> fib = fibo.fib
2026 >>> dir()
2027 ['__name__', 'a', 'fib', 'fibo', 'sys']
2028 \end{verbatim}
2029
2030 Note that it lists all types of names: variables, modules, functions, etc.
2031
2032 \function{dir()} does not list the names of built-in functions and
2033 variables.  If you want a list of those, they are defined in the
2034 standard module \module{__builtin__}\refbimodindex{__builtin__}:
2035
2036 \begin{verbatim}
2037 >>> import __builtin__
2038 >>> dir(__builtin__)
2039 ['AccessError', 'AttributeError', 'ConflictError', 'EOFError', 'IOError',
2040 'ImportError', 'IndexError', 'KeyError', 'KeyboardInterrupt',
2041 'MemoryError', 'NameError', 'None', 'OverflowError', 'RuntimeError',
2042 'SyntaxError', 'SystemError', 'SystemExit', 'TypeError', 'ValueError',
2043 'ZeroDivisionError', '__name__', 'abs', 'apply', 'chr', 'cmp', 'coerce',
2044 'compile', 'dir', 'divmod', 'eval', 'execfile', 'filter', 'float',
2045 'getattr', 'hasattr', 'hash', 'hex', 'id', 'input', 'int', 'len', 'long',
2046 'map', 'max', 'min', 'oct', 'open', 'ord', 'pow', 'range', 'raw_input',
2047 'reduce', 'reload', 'repr', 'round', 'setattr', 'str', 'type', 'xrange']
2048 \end{verbatim}
2049
2050 \section{Packages \label{packages}}
2051
2052 Packages are a way of structuring Python's module namespace
2053 by using ``dotted module names''.  For example, the module name
2054 \module{A.B} designates a submodule named \samp{B} in a package named
2055 \samp{A}.  Just like the use of modules saves the authors of different
2056 modules from having to worry about each other's global variable names,
2057 the use of dotted module names saves the authors of multi-module
2058 packages like NumPy or PIL from having to worry about each other's
2059 module names.
2060
2061 Suppose you want to design a collection of modules (a ``package'') for
2062 the uniform handling of sound files and sound data.  There are many
2063 different sound file formats (usually recognized by their extension,
2064 e.g. \file{.wav}, \file{.aiff}, \file{.au}), so you may need to create
2065 and maintain a growing collection of modules for the conversion
2066 between the various file formats.  There are also many different
2067 operations you might want to perform on sound data (e.g. mixing,
2068 adding echo, applying an equalizer function, creating an artificial
2069 stereo effect), so in addition you will be writing a never-ending
2070 stream of modules to perform these operations.  Here's a possible
2071 structure for your package (expressed in terms of a hierarchical
2072 filesystem):
2073
2074 \begin{verbatim}
2075 Sound/                          Top-level package
2076       __init__.py               Initialize the sound package
2077       Formats/                  Subpackage for file format conversions
2078               __init__.py
2079               wavread.py
2080               wavwrite.py
2081               aiffread.py
2082               aiffwrite.py
2083               auread.py
2084               auwrite.py
2085               ...
2086       Effects/                  Subpackage for sound effects
2087               __init__.py
2088               echo.py
2089               surround.py
2090               reverse.py
2091               ...
2092       Filters/                  Subpackage for filters
2093               __init__.py
2094               equalizer.py
2095               vocoder.py
2096               karaoke.py
2097               ...
2098 \end{verbatim}
2099 The \file{__init__.py} files are required to make Python treat the
2100 directories as containing packages; this is done to prevent
2101 directories with a common name, such as \samp{string}, from
2102 unintentionally hiding valid modules that occur later on the module
2103 search path. In the simplest case, \file{__init__.py} can just be an
2104 empty file, but it can also execute initialization code for the
2105 package or set the \code{__all__} variable, described later.
2106
2107 Users of the package can import individual modules from the
2108 package, for example:
2109
2110 \begin{verbatim}
2111 import Sound.Effects.echo
2112 \end{verbatim}
2113 This loads the submodule \module{Sound.Effects.echo}.  It must be referenced
2114 with its full name, e.g.
2115
2116 \begin{verbatim}
2117 Sound.Effects.echo.echofilter(input, output, delay=0.7, atten=4)
2118 \end{verbatim}
2119 An alternative way of importing the submodule is:
2120
2121 \begin{verbatim}
2122 from Sound.Effects import echo
2123 \end{verbatim}
2124 This also loads the submodule \module{echo}, and makes it available without
2125 its package prefix, so it can be used as follows:
2126
2127 \begin{verbatim}
2128 echo.echofilter(input, output, delay=0.7, atten=4)
2129 \end{verbatim}
2130
2131 Yet another variation is to import the desired function or variable directly:
2132
2133 \begin{verbatim}
2134 from Sound.Effects.echo import echofilter
2135 \end{verbatim}
2136
2137 Again, this loads the submodule \module{echo}, but this makes its function
2138 echofilter directly available:
2139
2140 \begin{verbatim}
2141 echofilter(input, output, delay=0.7, atten=4)
2142 \end{verbatim}
2143
2144 Note that when using \code{from \var{package} import \var{item}}, the
2145 item can be either a submodule (or subpackage) of the package, or some
2146 other name defined in the package, like a function, class or
2147 variable.  The \code{import} statement first tests whether the item is
2148 defined in the package; if not, it assumes it is a module and attempts
2149 to load it.  If it fails to find it, \exception{ImportError} is raised.
2150
2151 Contrarily, when using syntax like \code{import
2152 \var{item.subitem.subsubitem}}, each item except for the last must be
2153 a package; the last item can be a module or a package but can't be a
2154 class or function or variable defined in the previous item.
2155
2156 \subsection{Importing * From a Package \label{pkg-import-star}}
2157 %The \code{__all__} Attribute
2158
2159 Now what happens when the user writes \code{from Sound.Effects import
2160 *}?  Ideally, one would hope that this somehow goes out to the
2161 filesystem, finds which submodules are present in the package, and
2162 imports them all.  Unfortunately, this operation does not work very
2163 well on Mac and Windows platforms, where the filesystem does not
2164 always have accurate information about the case of a filename!  On
2165 these platforms, there is no guaranteed way to know whether a file
2166 \file{ECHO.PY} should be imported as a module \module{echo},
2167 \module{Echo} or \module{ECHO}.  (For example, Windows 95 has the
2168 annoying practice of showing all file names with a capitalized first
2169 letter.)  The DOS 8+3 filename restriction adds another interesting
2170 problem for long module names.
2171
2172 The only solution is for the package author to provide an explicit
2173 index of the package.  The import statement uses the following
2174 convention: if a package's \file{__init__.py} code defines a list named
2175 \code{__all__}, it is taken to be the list of module names that should be imported
2176 when \code{from \var{package} import *} is
2177 encountered.  It is up to the package author to keep this list
2178 up-to-date when a new version of the package is released.  Package
2179 authors may also decide not to support it, if they don't see a use for
2180 importing * from their package.  For example, the file
2181 \code{Sounds/Effects/__init__.py} could contain the following code:
2182
2183 \begin{verbatim}
2184 __all__ = ["echo", "surround", "reverse"]
2185 \end{verbatim}
2186
2187 This would mean that \code{from Sound.Effects import *} would
2188 import the three named submodules of the \module{Sound} package.
2189
2190 If \code{__all__} is not defined, the statement \code{from Sound.Effects
2191 import *} does \emph{not} import all submodules from the package
2192 \module{Sound.Effects} into the current namespace; it only ensures that the
2193 package \module{Sound.Effects} has been imported (possibly running its
2194 initialization code, \file{__init__.py}) and then imports whatever names are
2195 defined in the package.  This includes any names defined (and
2196 submodules explicitly loaded) by \file{__init__.py}.  It also includes any
2197 submodules of the package that were explicitly loaded by previous
2198 import statements, e.g.
2199
2200 \begin{verbatim}
2201 import Sound.Effects.echo
2202 import Sound.Effects.surround
2203 from Sound.Effects import *
2204 \end{verbatim}
2205
2206
2207 In this example, the echo and surround modules are imported in the
2208 current namespace because they are defined in the \module{Sound.Effects}
2209 package when the \code{from...import} statement is executed.  (This also
2210 works when \code{__all__} is defined.)
2211
2212 Note that in general the practicing of importing * from a module or
2213 package is frowned upon, since it often causes poorly readable code.
2214 However, it is okay to use it to save typing in interactive sessions,
2215 and certain modules are designed to export only names that follow
2216 certain patterns.
2217
2218 Remember, there is nothing wrong with using \code{from Package
2219 import specific_submodule}!  In fact, this is the
2220 recommended notation unless the importing module needs to use
2221 submodules with the same name from different packages.
2222
2223
2224 \subsection{Intra-package References}
2225
2226 The submodules often need to refer to each other.  For example, the
2227 \module{surround} module might use the \module{echo} module.  In fact, such references
2228 are so common that the \code{import} statement first looks in the
2229 containing package before looking in the standard module search path.
2230 Thus, the surround module can simply use \code{import echo} or
2231 \code{from echo import echofilter}.  If the imported module is not
2232 found in the current package (the package of which the current module
2233 is a submodule), the \code{import} statement looks for a top-level module
2234 with the given name.
2235
2236 When packages are structured into subpackages (as with the \module{Sound}
2237 package in the example), there's no shortcut to refer to submodules of
2238 sibling packages - the full name of the subpackage must be used.  For
2239 example, if the module \module{Sound.Filters.vocoder} needs to use the \module{echo}
2240 module in the \module{Sound.Effects} package, it can use \code{from
2241 Sound.Effects import echo}.
2242
2243 %(One could design a notation to refer to parent packages, similar to
2244 %the use of ".." to refer to the parent directory in Unix and Windows
2245 %filesystems.  In fact, the \module{ni} module, which was the
2246 %ancestor of this package system, supported this using \code{__} for
2247 %the package containing the current module,
2248 %\code{__.__} for the parent package, and so on.  This feature was dropped
2249 %because of its awkwardness; since most packages will have a relative
2250 %shallow substructure, this is no big loss.)
2251
2252
2253
2254 \chapter{Input and Output \label{io}}
2255
2256 There are several ways to present the output of a program; data can be
2257 printed in a human-readable form, or written to a file for future use.
2258 This chapter will discuss some of the possibilities.
2259
2260
2261 \section{Fancier Output Formatting \label{formatting}}
2262
2263 So far we've encountered two ways of writing values: \emph{expression
2264 statements} and the \keyword{print} statement.  (A third way is using
2265 the \method{write()} method of file objects; the standard output file
2266 can be referenced as \code{sys.stdout}.  See the Library Reference for
2267 more information on this.)
2268
2269 Often you'll want more control over the formatting of your output than
2270 simply printing space-separated values.  There are two ways to format
2271 your output; the first way is to do all the string handling yourself;
2272 using string slicing and concatenation operations you can create any
2273 lay-out you can imagine.  The standard module
2274 \module{string}\refstmodindex{string} contains some useful operations
2275 for padding strings to a given column width;
2276 these will be discussed shortly.  The second way is to use the
2277 \code{\%} operator with a string as the left argument.  \code{\%}
2278 interprets the left argument as a C \cfunction{sprintf()}-style
2279 format string to be applied to the right argument, and returns the
2280 string resulting from this formatting operation.
2281
2282 One question remains, of course: how do you convert values to strings?
2283 Luckily, Python has a way to convert any value to a string: pass it to
2284 the \function{repr()} function, or just write the value between
2285 reverse quotes (\code{``}).  Some examples:
2286
2287 \begin{verbatim}
2288 >>> x = 10 * 3.14
2289 >>> y = 200*200
2290 >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
2291 >>> print s
2292 The value of x is 31.4, and y is 40000...
2293 >>> # Reverse quotes work on other types besides numbers:
2294 ... p = [x, y]
2295 >>> ps = repr(p)
2296 >>> ps
2297 '[31.4, 40000]'
2298 >>> # Converting a string adds string quotes and backslashes:
2299 ... hello = 'hello, world\n'
2300 >>> hellos = `hello`
2301 >>> print hellos
2302 'hello, world\012'
2303 >>> # The argument of reverse quotes may be a tuple:
2304 ... `x, y, ('spam', 'eggs')`
2305 "(31.4, 40000, ('spam', 'eggs'))"
2306 \end{verbatim}
2307
2308 Here are two ways to write a table of squares and cubes:
2309
2310 \begin{verbatim}
2311 >>> import string
2312 >>> for x in range(1, 11):
2313 ...     print string.rjust(`x`, 2), string.rjust(`x*x`, 3),
2314 ...     # Note trailing comma on previous line
2315 ...     print string.rjust(`x*x*x`, 4)
2316 ...
2317  1   1    1
2318  2   4    8
2319  3   9   27
2320  4  16   64
2321  5  25  125
2322  6  36  216
2323  7  49  343
2324  8  64  512
2325  9  81  729
2326 10 100 1000
2327 >>> for x in range(1,11):
2328 ...     print '%2d %3d %4d' % (x, x*x, x*x*x)
2329 ...
2330  1   1    1
2331  2   4    8
2332  3   9   27
2333  4  16   64
2334  5  25  125
2335  6  36  216
2336  7  49  343
2337  8  64  512
2338  9  81  729
2339 10 100 1000
2340 \end{verbatim}
2341
2342 (Note that one space between each column was added by the way
2343 \keyword{print} works: it always adds spaces between its arguments.)
2344
2345 This example demonstrates the function \function{string.rjust()},
2346 which right-justifies a string in a field of a given width by padding
2347 it with spaces on the left.  There are similar functions
2348 \function{string.ljust()} and \function{string.center()}.  These
2349 functions do not write anything, they just return a new string.  If
2350 the input string is too long, they don't truncate it, but return it
2351 unchanged; this will mess up your column lay-out but that's usually
2352 better than the alternative, which would be lying about a value.  (If
2353 you really want truncation you can always add a slice operation, as in
2354 \samp{string.ljust(x,~n)[0:n]}.)
2355
2356 There is another function, \function{string.zfill()}, which pads a
2357 numeric string on the left with zeros.  It understands about plus and
2358 minus signs:
2359
2360 \begin{verbatim}
2361 >>> string.zfill('12', 5)
2362 '00012'
2363 >>> string.zfill('-3.14', 7)
2364 '-003.14'
2365 >>> string.zfill('3.14159265359', 5)
2366 '3.14159265359'
2367 \end{verbatim}
2368 %
2369 Using the \code{\%} operator looks like this:
2370
2371 \begin{verbatim}
2372 >>> import math
2373 >>> print 'The value of PI is approximately %5.3f.' % math.pi
2374 The value of PI is approximately 3.142.
2375 \end{verbatim}
2376
2377 If there is more than one format in the string you pass a tuple as
2378 right operand, e.g.
2379
2380 \begin{verbatim}
2381 >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
2382 >>> for name, phone in table.items():
2383 ...     print '%-10s ==> %10d' % (name, phone)
2384 ...
2385 Jack       ==>       4098
2386 Dcab       ==>    8637678
2387 Sjoerd     ==>       4127
2388 \end{verbatim}
2389
2390 Most formats work exactly as in C and require that you pass the proper
2391 type; however, if you don't you get an exception, not a core dump.
2392 The \code{\%s} format is more relaxed: if the corresponding argument is
2393 not a string object, it is converted to string using the
2394 \function{str()} built-in function.  Using \code{*} to pass the width
2395 or precision in as a separate (integer) argument is supported.  The
2396 C formats \code{\%n} and \code{\%p} are not supported.
2397
2398 If you have a really long format string that you don't want to split
2399 up, it would be nice if you could reference the variables to be
2400 formatted by name instead of by position.  This can be done by using
2401 an extension of C formats using the form \code{\%(name)format}, e.g.
2402
2403 \begin{verbatim}
2404 >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
2405 >>> print 'Jack: %(Jack)d; Sjoerd: %(Sjoerd)d; Dcab: %(Dcab)d' % table
2406 Jack: 4098; Sjoerd: 4127; Dcab: 8637678
2407 \end{verbatim}
2408
2409 This is particularly useful in combination with the new built-in
2410 \function{vars()} function, which returns a dictionary containing all
2411 local variables.
2412
2413 \section{Reading and Writing Files \label{files}}
2414
2415 % Opening files
2416 \function{open()}\bifuncindex{open} returns a file
2417 object\obindex{file}, and is most commonly used with two arguments:
2418 \samp{open(\var{filename}, \var{mode})}.
2419
2420 \begin{verbatim}
2421 >>> f=open('/tmp/workfile', 'w')
2422 >>> print f
2423 <open file '/tmp/workfile', mode 'w' at 80a0960>
2424 \end{verbatim}
2425
2426 The first argument is a string containing the filename.  The second
2427 argument is another string containing a few characters describing the
2428 way in which the file will be used.  \var{mode} can be \code{'r'} when
2429 the file will only be read, \code{'w'} for only writing (an existing
2430 file with the same name will be erased), and \code{'a'} opens the file
2431 for appending; any data written to the file is automatically added to
2432 the end.  \code{'r+'} opens the file for both reading and writing.
2433 The \var{mode} argument is optional; \code{'r'} will be assumed if
2434 it's omitted.
2435
2436 On Windows and the Macintosh, \code{'b'} appended to the
2437 mode opens the file in binary mode, so there are also modes like
2438 \code{'rb'}, \code{'wb'}, and \code{'r+b'}.  Windows makes a
2439 distinction between text and binary files; the end-of-line characters
2440 in text files are automatically altered slightly when data is read or
2441 written.  This behind-the-scenes modification to file data is fine for
2442 \ASCII{} text files, but it'll corrupt binary data like that in JPEGs or
2443 \file{.EXE} files.  Be very careful to use binary mode when reading and
2444 writing such files.  (Note that the precise semantics of text mode on
2445 the Macintosh depends on the underlying C library being used.)
2446
2447 \subsection{Methods of File Objects \label{fileMethods}}
2448
2449 The rest of the examples in this section will assume that a file
2450 object called \code{f} has already been created.
2451
2452 To read a file's contents, call \code{f.read(\var{size})}, which reads
2453 some quantity of data and returns it as a string.  \var{size} is an
2454 optional numeric argument.  When \var{size} is omitted or negative,
2455 the entire contents of the file will be read and returned; it's your
2456 problem if the file is twice as large as your machine's memory.
2457 Otherwise, at most \var{size} bytes are read and returned.  If the end
2458 of the file has been reached, \code{f.read()} will return an empty
2459 string (\code {""}).
2460 \begin{verbatim}
2461 >>> f.read()
2462 'This is the entire file.\012'
2463 >>> f.read()
2464 ''
2465 \end{verbatim}
2466
2467 \code{f.readline()} reads a single line from the file; a newline
2468 character (\code{\e n}) is left at the end of the string, and is only
2469 omitted on the last line of the file if the file doesn't end in a
2470 newline.  This makes the return value unambiguous; if
2471 \code{f.readline()} returns an empty string, the end of the file has
2472 been reached, while a blank line is represented by \code{'\e n'}, a
2473 string containing only a single newline.
2474
2475 \begin{verbatim}
2476 >>> f.readline()
2477 'This is the first line of the file.\012'
2478 >>> f.readline()
2479 'Second line of the file\012'
2480 >>> f.readline()
2481 ''
2482 \end{verbatim}
2483
2484 \code{f.readlines()} uses \code{f.readline()} repeatedly, and returns
2485 a list containing all the lines of data in the file.
2486
2487 \begin{verbatim}
2488 >>> f.readlines()
2489 ['This is the first line of the file.\012', 'Second line of the file\012']
2490 \end{verbatim}
2491
2492 \code{f.write(\var{string})} writes the contents of \var{string} to
2493 the file, returning \code{None}.
2494
2495 \begin{verbatim}
2496 >>> f.write('This is a test\n')
2497 \end{verbatim}
2498
2499 \code{f.tell()} returns an integer giving the file object's current
2500 position in the file, measured in bytes from the beginning of the
2501 file.  To change the file object's position, use
2502 \samp{f.seek(\var{offset}, \var{from_what})}.  The position is
2503 computed from adding \var{offset} to a reference point; the reference
2504 point is selected by the \var{from_what} argument.  A \var{from_what}
2505 value of 0 measures from the beginning of the file, 1 uses the current
2506 file position, and 2 uses the end of the file as the reference point.
2507 \var{from_what} can be omitted and defaults to 0, using the beginning
2508 of the file as the reference point.
2509
2510 \begin{verbatim}
2511 >>> f=open('/tmp/workfile', 'r+')
2512 >>> f.write('0123456789abcdef')
2513 >>> f.seek(5)     # Go to the 5th byte in the file
2514 >>> f.read(1)
2515 '5'
2516 >>> f.seek(-3, 2) # Go to the 3rd byte before the end
2517 >>> f.read(1)
2518 'd'
2519 \end{verbatim}
2520
2521 When you're done with a file, call \code{f.close()} to close it and
2522 free up any system resources taken up by the open file.  After calling
2523 \code{f.close()}, attempts to use the file object will automatically fail.
2524
2525 \begin{verbatim}
2526 >>> f.close()
2527 >>> f.read()
2528 Traceback (innermost last):
2529   File "<stdin>", line 1, in ?
2530 ValueError: I/O operation on closed file
2531 \end{verbatim}
2532
2533 File objects have some additional methods, such as \method{isatty()}
2534 and \method{truncate()} which are less frequently used; consult the
2535 Library Reference for a complete guide to file objects.
2536
2537 \subsection{The \module{pickle} Module \label{pickle}}
2538 \refstmodindex{pickle}
2539
2540 Strings can easily be written to and read from a file. Numbers take a
2541 bit more effort, since the \method{read()} method only returns
2542 strings, which will have to be passed to a function like
2543 \function{string.atoi()}, which takes a string like \code{'123'} and
2544 returns its numeric value 123.  However, when you want to save more
2545 complex data types like lists, dictionaries, or class instances,
2546 things get a lot more complicated.
2547
2548 Rather than have users be constantly writing and debugging code to
2549 save complicated data types, Python provides a standard module called
2550 \module{pickle}.  This is an amazing module that can take almost
2551 any Python object (even some forms of Python code!), and convert it to
2552 a string representation; this process is called \dfn{pickling}.
2553 Reconstructing the object from the string representation is called
2554 \dfn{unpickling}.  Between pickling and unpickling, the string
2555 representing the object may have been stored in a file or data, or
2556 sent over a network connection to some distant machine.
2557
2558 If you have an object \code{x}, and a file object \code{f} that's been
2559 opened for writing, the simplest way to pickle the object takes only
2560 one line of code:
2561
2562 \begin{verbatim}
2563 pickle.dump(x, f)
2564 \end{verbatim}
2565
2566 To unpickle the object again, if \code{f} is a file object which has
2567 been opened for reading:
2568
2569 \begin{verbatim}
2570 x = pickle.load(f)
2571 \end{verbatim}
2572
2573 (There are other variants of this, used when pickling many objects or
2574 when you don't want to write the pickled data to a file; consult the
2575 complete documentation for \module{pickle} in the Library Reference.)
2576
2577 \module{pickle} is the standard way to make Python objects which can be
2578 stored and reused by other programs or by a future invocation of the
2579 same program; the technical term for this is a \dfn{persistent}
2580 object.  Because \module{pickle} is so widely used, many authors who
2581 write Python extensions take care to ensure that new data types such
2582 as matrices can be properly pickled and unpickled.
2583
2584
2585
2586 \chapter{Errors and Exceptions \label{errors}}
2587
2588 Until now error messages haven't been more than mentioned, but if you
2589 have tried out the examples you have probably seen some.  There are
2590 (at least) two distinguishable kinds of errors: \emph{syntax errors}
2591 and \emph{exceptions}.
2592
2593 \section{Syntax Errors \label{syntaxErrors}}
2594
2595 Syntax errors, also known as parsing errors, are perhaps the most common
2596 kind of complaint you get while you are still learning Python:
2597
2598 \begin{verbatim}
2599 >>> while 1 print 'Hello world'
2600   File "<stdin>", line 1
2601     while 1 print 'Hello world'
2602                 ^
2603 SyntaxError: invalid syntax
2604 \end{verbatim}
2605
2606 The parser repeats the offending line and displays a little `arrow'
2607 pointing at the earliest point in the line where the error was detected.
2608 The error is caused by (or at least detected at) the token
2609 \emph{preceding}
2610 the arrow: in the example, the error is detected at the keyword
2611 \keyword{print}, since a colon (\character{:}) is missing before it.
2612 File name and line number are printed so you know where to look in case
2613 the input came from a script.
2614
2615 \section{Exceptions \label{exceptions}}
2616
2617 Even if a statement or expression is syntactically correct, it may
2618 cause an error when an attempt is made to execute it.
2619 Errors detected during execution are called \emph{exceptions} and are
2620 not unconditionally fatal: you will soon learn how to handle them in
2621 Python programs.  Most exceptions are not handled by programs,
2622 however, and result in error messages as shown here:
2623
2624 \begin{verbatim}
2625 >>> 10 * (1/0)
2626 Traceback (innermost last):
2627   File "<stdin>", line 1
2628 ZeroDivisionError: integer division or modulo
2629 >>> 4 + spam*3
2630 Traceback (innermost last):
2631   File "<stdin>", line 1
2632 NameError: spam
2633 >>> '2' + 2
2634 Traceback (innermost last):
2635   File "<stdin>", line 1
2636 TypeError: illegal argument type for built-in operation
2637 \end{verbatim}
2638
2639 The last line of the error message indicates what happened.
2640 Exceptions come in different types, and the type is printed as part of
2641 the message: the types in the example are
2642 \exception{ZeroDivisionError},
2643 \exception{NameError}
2644 and
2645 \exception{TypeError}.
2646 The string printed as the exception type is the name of the built-in
2647 name for the exception that occurred.  This is true for all built-in
2648 exceptions, but need not be true for user-defined exceptions (although
2649 it is a useful convention).
2650 Standard exception names are built-in identifiers (not reserved
2651 keywords).
2652
2653 The rest of the line is a detail whose interpretation depends on the
2654 exception type; its meaning is dependent on the exception type.
2655
2656 The preceding part of the error message shows the context where the
2657 exception happened, in the form of a stack backtrace.
2658 In general it contains a stack backtrace listing source lines; however,
2659 it will not display lines read from standard input.
2660
2661 The Library Reference lists the built-in exceptions and their
2662 meanings.
2663
2664 \section{Handling Exceptions \label{handling}}
2665
2666 It is possible to write programs that handle selected exceptions.
2667 Look at the following example, which prints a table of inverses of
2668 some floating point numbers:
2669
2670 \begin{verbatim}
2671 >>> numbers = [0.3333, 2.5, 0, 10]
2672 >>> for x in numbers:
2673 ...     print x,
2674 ...     try:
2675 ...         print 1.0 / x
2676 ...     except ZeroDivisionError:
2677 ...         print '*** has no inverse ***'
2678 ...
2679 0.3333 3.00030003
2680 2.5 0.4
2681 0 *** has no inverse ***
2682 10 0.1
2683 \end{verbatim}
2684
2685 The \keyword{try} statement works as follows.
2686 \begin{itemize}
2687 \item
2688 First, the \emph{try clause}
2689 (the statement(s) between the \keyword{try} and \keyword{except}
2690 keywords) is executed.
2691 \item
2692 If no exception occurs, the
2693 \emph{except\ clause}
2694 is skipped and execution of the \keyword{try} statement is finished.
2695 \item
2696 If an exception occurs during execution of the try clause,
2697 the rest of the clause is skipped.  Then if its type matches the
2698 exception named after the \keyword{except} keyword, the rest of the
2699 try clause is skipped, the except clause is executed, and then
2700 execution continues after the \keyword{try} statement.
2701 \item
2702 If an exception occurs which does not match the exception named in the
2703 except clause, it is passed on to outer \keyword{try} statements; if
2704 no handler is found, it is an \emph{unhandled exception}
2705 and execution stops with a message as shown above.
2706 \end{itemize}
2707 A \keyword{try} statement may have more than one except clause, to
2708 specify handlers for different exceptions.
2709 At most one handler will be executed.
2710 Handlers only handle exceptions that occur in the corresponding try
2711 clause, not in other handlers of the same \keyword{try} statement.
2712 An except clause may name multiple exceptions as a parenthesized list,
2713 e.g.:
2714
2715 \begin{verbatim}
2716 ... except (RuntimeError, TypeError, NameError):
2717 ...     pass
2718 \end{verbatim}
2719
2720 The last except clause may omit the exception name(s), to serve as a
2721 wildcard.
2722 Use this with extreme caution, since it is easy to mask a real
2723 programming error in this way!
2724
2725 The \keyword{try} \ldots\ \keyword{except} statement has an optional
2726 \emph{else clause}, which must follow all except clauses.  It is
2727 useful to place code that must be executed if the try clause does not
2728 raise an exception.  For example:
2729
2730 \begin{verbatim}
2731 for arg in sys.argv[1:]:
2732     try:
2733         f = open(arg, 'r')
2734     except IOError:
2735         print 'cannot open', arg
2736     else:
2737         print arg, 'has', len(f.readlines()), 'lines'
2738         f.close()
2739 \end{verbatim}
2740
2741
2742 When an exception occurs, it may have an associated value, also known as
2743 the exceptions's \emph{argument}.
2744 The presence and type of the argument depend on the exception type.
2745 For exception types which have an argument, the except clause may
2746 specify a variable after the exception name (or list) to receive the
2747 argument's value, as follows:
2748
2749 \begin{verbatim}
2750 >>> try:
2751 ...     spam()
2752 ... except NameError, x:
2753 ...     print 'name', x, 'undefined'
2754 ...
2755 name spam undefined
2756 \end{verbatim}
2757
2758 If an exception has an argument, it is printed as the last part
2759 (`detail') of the message for unhandled exceptions.
2760
2761 Exception handlers don't just handle exceptions if they occur
2762 immediately in the try clause, but also if they occur inside functions
2763 that are called (even indirectly) in the try clause.
2764 For example:
2765
2766 \begin{verbatim}
2767 >>> def this_fails():
2768 ...     x = 1/0
2769 ...
2770 >>> try:
2771 ...     this_fails()
2772 ... except ZeroDivisionError, detail:
2773 ...     print 'Handling run-time error:', detail
2774 ...
2775 Handling run-time error: integer division or modulo
2776 \end{verbatim}
2777
2778
2779 \section{Raising Exceptions \label{raising}}
2780
2781 The \keyword{raise} statement allows the programmer to force a
2782 specified exception to occur.
2783 For example:
2784
2785 \begin{verbatim}
2786 >>> raise NameError, 'HiThere'
2787 Traceback (innermost last):
2788   File "<stdin>", line 1
2789 NameError: HiThere
2790 \end{verbatim}
2791
2792 The first argument to \keyword{raise} names the exception to be
2793 raised.  The optional second argument specifies the exception's
2794 argument.
2795
2796
2797 \section{User-defined Exceptions \label{userExceptions}}
2798
2799 Programs may name their own exceptions by assigning a string to a
2800 variable.
2801 For example:
2802
2803 \begin{verbatim}
2804 >>> my_exc = 'my_exc'
2805 >>> try:
2806 ...     raise my_exc, 2*2
2807 ... except my_exc, val:
2808 ...     print 'My exception occurred, value:', val
2809 ...
2810 My exception occurred, value: 4
2811 >>> raise my_exc, 1
2812 Traceback (innermost last):
2813   File "<stdin>", line 1
2814 my_exc: 1
2815 \end{verbatim}
2816
2817 Many standard modules use this to report errors that may occur in
2818 functions they define.
2819
2820
2821 \section{Defining Clean-up Actions \label{cleanup}}
2822
2823 The \keyword{try} statement has another optional clause which is
2824 intended to define clean-up actions that must be executed under all
2825 circumstances.  For example:
2826
2827 \begin{verbatim}
2828 >>> try:
2829 ...     raise KeyboardInterrupt
2830 ... finally:
2831 ...     print 'Goodbye, world!'
2832 ...
2833 Goodbye, world!
2834 Traceback (innermost last):
2835   File "<stdin>", line 2
2836 KeyboardInterrupt
2837 \end{verbatim}
2838
2839 A \emph{finally clause} is executed whether or not an exception has
2840 occurred in the try clause.  When an exception has occurred, it is
2841 re-raised after the finally clause is executed.  The finally clause is
2842 also executed ``on the way out'' when the \keyword{try} statement is
2843 left via a \keyword{break} or \keyword{return} statement.
2844
2845 A \keyword{try} statement must either have one or more except clauses
2846 or one finally clause, but not both.
2847
2848 \chapter{Classes \label{classes}}
2849
2850 Python's class mechanism adds classes to the language with a minimum
2851 of new syntax and semantics.  It is a mixture of the class mechanisms
2852 found in \Cpp{} and Modula-3.  As is true for modules, classes in Python
2853 do not put an absolute barrier between definition and user, but rather
2854 rely on the politeness of the user not to ``break into the
2855 definition.''  The most important features of classes are retained
2856 with full power, however: the class inheritance mechanism allows
2857 multiple base classes, a derived class can override any methods of its
2858 base class or classes, a method can call the method of a base class with the
2859 same name.  Objects can contain an arbitrary amount of private data.
2860
2861 In \Cpp{} terminology, all class members (including the data members) are
2862 \emph{public}, and all member functions are \emph{virtual}.  There are
2863 no special constructors or destructors.  As in Modula-3, there are no
2864 shorthands for referencing the object's members from its methods: the
2865 method function is declared with an explicit first argument
2866 representing the object, which is provided implicitly by the call.  As
2867 in Smalltalk, classes themselves are objects, albeit in the wider
2868 sense of the word: in Python, all data types are objects.  This
2869 provides semantics for importing and renaming.  But, just like in \Cpp{}
2870 or Modula-3, built-in types cannot be used as base classes for
2871 extension by the user.  Also, like in \Cpp{} but unlike in Modula-3, most
2872 built-in operators with special syntax (arithmetic operators,
2873 subscripting etc.) can be redefined for class instances.
2874
2875 \section{A Word About Terminology \label{terminology}}
2876
2877 Lacking universally accepted terminology to talk about classes, I will
2878 make occasional use of Smalltalk and \Cpp{} terms.  (I would use Modula-3
2879 terms, since its object-oriented semantics are closer to those of
2880 Python than \Cpp{}, but I expect that few readers have heard of it.)
2881
2882 I also have to warn you that there's a terminological pitfall for
2883 object-oriented readers: the word ``object'' in Python does not
2884 necessarily mean a class instance.  Like \Cpp{} and Modula-3, and
2885 unlike Smalltalk, not all types in Python are classes: the basic
2886 built-in types like integers and lists are not, and even somewhat more
2887 exotic types like files aren't.  However, \emph{all} Python types
2888 share a little bit of common semantics that is best described by using
2889 the word object.
2890
2891 Objects have individuality, and multiple names (in multiple scopes)
2892 can be bound to the same object.  This is known as aliasing in other
2893 languages.  This is usually not appreciated on a first glance at
2894 Python, and can be safely ignored when dealing with immutable basic
2895 types (numbers, strings, tuples).  However, aliasing has an
2896 (intended!) effect on the semantics of Python code involving mutable
2897 objects such as lists, dictionaries, and most types representing
2898 entities outside the program (files, windows, etc.).  This is usually
2899 used to the benefit of the program, since aliases behave like pointers
2900 in some respects.  For example, passing an object is cheap since only
2901 a pointer is passed by the implementation; and if a function modifies
2902 an object passed as an argument, the caller will see the change --- this
2903 obviates the need for two different argument passing mechanisms as in
2904 Pascal.
2905
2906
2907 \section{Python Scopes and Name Spaces \label{scopes}}
2908
2909 Before introducing classes, I first have to tell you something about
2910 Python's scope rules.  Class definitions play some neat tricks with
2911 name spaces, and you need to know how scopes and name spaces work to
2912 fully understand what's going on.  Incidentally, knowledge about this
2913 subject is useful for any advanced Python programmer.
2914
2915 Let's begin with some definitions.
2916
2917 A \emph{name space} is a mapping from names to objects.  Most name
2918 spaces are currently implemented as Python dictionaries, but that's
2919 normally not noticeable in any way (except for performance), and it
2920 may change in the future.  Examples of name spaces are: the set of
2921 built-in names (functions such as \function{abs()}, and built-in exception
2922 names); the global names in a module; and the local names in a
2923 function invocation.  In a sense the set of attributes of an object
2924 also form a name space.  The important thing to know about name
2925 spaces is that there is absolutely no relation between names in
2926 different name spaces; for instance, two different modules may both
2927 define a function ``maximize'' without confusion --- users of the
2928 modules must prefix it with the module name.
2929
2930 By the way, I use the word \emph{attribute} for any name following a
2931 dot --- for example, in the expression \code{z.real}, \code{real} is
2932 an attribute of the object \code{z}.  Strictly speaking, references to
2933 names in modules are attribute references: in the expression
2934 \code{modname.funcname}, \code{modname} is a module object and
2935 \code{funcname} is an attribute of it.  In this case there happens to
2936 be a straightforward mapping between the module's attributes and the
2937 global names defined in the module: they share the same name
2938 space!\footnote{
2939         Except for one thing.  Module objects have a secret read-only
2940         attribute called \code{__dict__} which returns the dictionary
2941         used to implement the module's name space; the name
2942         \code{__dict__} is an attribute but not a global name.
2943         Obviously, using this violates the abstraction of name space
2944         implementation, and should be restricted to things like
2945         post-mortem debuggers.
2946 }
2947
2948 Attributes may be read-only or writable.  In the latter case,
2949 assignment to attributes is possible.  Module attributes are writable:
2950 you can write \samp{modname.the_answer = 42}.  Writable attributes may
2951 also be deleted with the \keyword{del} statement, e.g.
2952 \samp{del modname.the_answer}.
2953
2954 Name spaces are created at different moments and have different
2955 lifetimes.  The name space containing the built-in names is created
2956 when the Python interpreter starts up, and is never deleted.  The
2957 global name space for a module is created when the module definition
2958 is read in; normally, module name spaces also last until the
2959 interpreter quits.  The statements executed by the top-level
2960 invocation of the interpreter, either read from a script file or
2961 interactively, are considered part of a module called
2962 \module{__main__}, so they have their own global name space.  (The
2963 built-in names actually also live in a module; this is called
2964 \module{__builtin__}.)
2965
2966 The local name space for a function is created when the function is
2967 called, and deleted when the function returns or raises an exception
2968 that is not handled within the function.  (Actually, forgetting would
2969 be a better way to describe what actually happens.)  Of course,
2970 recursive invocations each have their own local name space.
2971
2972 A \emph{scope} is a textual region of a Python program where a name space
2973 is directly accessible.  ``Directly accessible'' here means that an
2974 unqualified reference to a name attempts to find the name in the name
2975 space.
2976
2977 Although scopes are determined statically, they are used dynamically.
2978 At any time during execution, exactly three nested scopes are in use
2979 (i.e., exactly three name spaces are directly accessible): the
2980 innermost scope, which is searched first, contains the local names,
2981 the middle scope, searched next, contains the current module's global
2982 names, and the outermost scope (searched last) is the name space
2983 containing built-in names.
2984
2985 Usually, the local scope references the local names of the (textually)
2986 current function.  Outside of functions, the local scope references
2987 the same name space as the global scope: the module's name space.
2988 Class definitions place yet another name space in the local scope.
2989
2990 It is important to realize that scopes are determined textually: the
2991 global scope of a function defined in a module is that module's name
2992 space, no matter from where or by what alias the function is called.
2993 On the other hand, the actual search for names is done dynamically, at
2994 run time --- however, the language definition is evolving towards
2995 static name resolution, at ``compile'' time, so don't rely on dynamic
2996 name resolution!  (In fact, local variables are already determined
2997 statically.)
2998
2999 A special quirk of Python is that assignments always go into the
3000 innermost scope.  Assignments do not copy data --- they just
3001 bind names to objects.  The same is true for deletions: the statement
3002 \samp{del x} removes the binding of \code{x} from the name space
3003 referenced by the local scope.  In fact, all operations that introduce
3004 new names use the local scope: in particular, import statements and
3005 function definitions bind the module or function name in the local
3006 scope.  (The \keyword{global} statement can be used to indicate that
3007 particular variables live in the global scope.)
3008
3009
3010 \section{A First Look at Classes \label{firstClasses}}
3011
3012 Classes introduce a little bit of new syntax, three new object types,
3013 and some new semantics.
3014
3015
3016 \subsection{Class Definition Syntax \label{classDefinition}}
3017
3018 The simplest form of class definition looks like this:
3019
3020 \begin{verbatim}
3021 class ClassName:
3022     <statement-1>
3023     .
3024     .
3025     .
3026     <statement-N>
3027 \end{verbatim}
3028
3029 Class definitions, like function definitions (\keyword{def}
3030 statements) must be executed before they have any effect.  (You could
3031 conceivably place a class definition in a branch of an \keyword{if}
3032 statement, or inside a function.)
3033
3034 In practice, the statements inside a class definition will usually be
3035 function definitions, but other statements are allowed, and sometimes
3036 useful --- we'll come back to this later.  The function definitions
3037 inside a class normally have a peculiar form of argument list,
3038 dictated by the calling conventions for methods --- again, this is
3039 explained later.
3040
3041 When a class definition is entered, a new name space is created, and
3042 used as the local scope --- thus, all assignments to local variables
3043 go into this new name space.  In particular, function definitions bind
3044 the name of the new function here.
3045
3046 When a class definition is left normally (via the end), a \emph{class
3047 object} is created.  This is basically a wrapper around the contents
3048 of the name space created by the class definition; we'll learn more
3049 about class objects in the next section.  The original local scope
3050 (the one in effect just before the class definitions was entered) is
3051 reinstated, and the class object is bound here to the class name given
3052 in the class definition header (\class{ClassName} in the example).
3053
3054
3055 \subsection{Class Objects \label{classObjects}}
3056
3057 Class objects support two kinds of operations: attribute references
3058 and instantiation.
3059
3060 \emph{Attribute references} use the standard syntax used for all
3061 attribute references in Python: \code{obj.name}.  Valid attribute
3062 names are all the names that were in the class's name space when the
3063 class object was created.  So, if the class definition looked like
3064 this:
3065
3066 \begin{verbatim}
3067 class MyClass:
3068     "A simple example class"
3069     i = 12345
3070     def f(x):
3071         return 'hello world'
3072 \end{verbatim}
3073
3074 then \code{MyClass.i} and \code{MyClass.f} are valid attribute
3075 references, returning an integer and a function object, respectively.
3076 Class attributes can also be assigned to, so you can change the value
3077 of \code{MyClass.i} by assignment.  \code{__doc__} is also a valid
3078 attribute that's read-only, returning the docstring belonging to
3079 the class: \code{"A simple example class"}).
3080
3081 Class \emph{instantiation} uses function notation.  Just pretend that
3082 the class object is a parameterless function that returns a new
3083 instance of the class.  For example, (assuming the above class):
3084
3085 \begin{verbatim}
3086 x = MyClass()
3087 \end{verbatim}
3088
3089 creates a new \emph{instance} of the class and assigns this object to
3090 the local variable \code{x}.
3091
3092
3093 \subsection{Instance Objects \label{instanceObjects}}
3094
3095 Now what can we do with instance objects?  The only operations
3096 understood by instance objects are attribute references.  There are
3097 two kinds of valid attribute names.
3098
3099 The first I'll call \emph{data attributes}.  These correspond to
3100 ``instance variables'' in Smalltalk, and to ``data members'' in
3101 \Cpp{}.  Data attributes need not be declared; like local variables,
3102 they spring into existence when they are first assigned to.  For
3103 example, if \code{x} is the instance of \class{MyClass} created above,
3104 the following piece of code will print the value \code{16}, without
3105 leaving a trace:
3106
3107 \begin{verbatim}
3108 x.counter = 1
3109 while x.counter < 10:
3110     x.counter = x.counter * 2
3111 print x.counter
3112 del x.counter
3113 \end{verbatim}
3114
3115 The second kind of attribute references understood by instance objects
3116 are \emph{methods}.  A method is a function that ``belongs to'' an
3117 object.  (In Python, the term method is not unique to class instances:
3118 other object types can have methods as well, e.g., list objects have
3119 methods called append, insert, remove, sort, and so on.  However,
3120 below, we'll use the term method exclusively to mean methods of class
3121 instance objects, unless explicitly stated otherwise.)
3122
3123 Valid method names of an instance object depend on its class.  By
3124 definition, all attributes of a class that are (user-defined) function
3125 objects define corresponding methods of its instances.  So in our
3126 example, \code{x.f} is a valid method reference, since
3127 \code{MyClass.f} is a function, but \code{x.i} is not, since
3128 \code{MyClass.i} is not.  But \code{x.f} is not the same thing as
3129 \code{MyClass.f} --- it is a \emph{method object}, not a function
3130 object.%
3131 \obindex{method}
3132
3133
3134 \subsection{Method Objects \label{methodObjects}}
3135
3136 Usually, a method is called immediately, e.g.:
3137
3138 \begin{verbatim}
3139 x.f()
3140 \end{verbatim}
3141
3142 In our example, this will return the string \code{'hello world'}.
3143 However, it is not necessary to call a method right away:
3144 \code{x.f} is a method object, and can be stored away and called at a
3145 later time.  For example:
3146
3147 \begin{verbatim}
3148 xf = x.f
3149 while 1:
3150     print xf()
3151 \end{verbatim}
3152
3153 will continue to print \samp{hello world} until the end of time.
3154
3155 What exactly happens when a method is called?  You may have noticed
3156 that \code{x.f()} was called without an argument above, even though
3157 the function definition for \method{f} specified an argument.  What
3158 happened to the argument?  Surely Python raises an exception when a
3159 function that requires an argument is called without any --- even if
3160 the argument isn't actually used...
3161
3162 Actually, you may have guessed the answer: the special thing about
3163 methods is that the object is passed as the first argument of the
3164 function.  In our example, the call \code{x.f()} is exactly equivalent
3165 to \code{MyClass.f(x)}.  In general, calling a method with a list of
3166 \var{n} arguments is equivalent to calling the corresponding function
3167 with an argument list that is created by inserting the method's object
3168 before the first argument.
3169
3170 If you still don't understand how methods work, a look at the
3171 implementation can perhaps clarify matters.  When an instance
3172 attribute is referenced that isn't a data attribute, its class is
3173 searched.  If the name denotes a valid class attribute that is a
3174 function object, a method object is created by packing (pointers to)
3175 the instance object and the function object just found together in an
3176 abstract object: this is the method object.  When the method object is
3177 called with an argument list, it is unpacked again, a new argument
3178 list is constructed from the instance object and the original argument
3179 list, and the function object is called with this new argument list.
3180
3181
3182 \section{Random Remarks \label{remarks}}
3183
3184 [These should perhaps be placed more carefully...]
3185
3186
3187 Data attributes override method attributes with the same name; to
3188 avoid accidental name conflicts, which may cause hard-to-find bugs in
3189 large programs, it is wise to use some kind of convention that
3190 minimizes the chance of conflicts, e.g., capitalize method names,
3191 prefix data attribute names with a small unique string (perhaps just
3192 an underscore), or use verbs for methods and nouns for data attributes.
3193
3194
3195 Data attributes may be referenced by methods as well as by ordinary
3196 users (``clients'') of an object.  In other words, classes are not
3197 usable to implement pure abstract data types.  In fact, nothing in
3198 Python makes it possible to enforce data hiding --- it is all based
3199 upon convention.  (On the other hand, the Python implementation,
3200 written in C, can completely hide implementation details and control
3201 access to an object if necessary; this can be used by extensions to
3202 Python written in C.)
3203
3204
3205 Clients should use data attributes with care --- clients may mess up
3206 invariants maintained by the methods by stamping on their data
3207 attributes.  Note that clients may add data attributes of their own to
3208 an instance object without affecting the validity of the methods, as
3209 long as name conflicts are avoided --- again, a naming convention can
3210 save a lot of headaches here.
3211
3212
3213 There is no shorthand for referencing data attributes (or other
3214 methods!) from within methods.  I find that this actually increases
3215 the readability of methods: there is no chance of confusing local
3216 variables and instance variables when glancing through a method.
3217
3218
3219 Conventionally, the first argument of methods is often called
3220 \code{self}.  This is nothing more than a convention: the name
3221 \code{self} has absolutely no special meaning to Python.  (Note,
3222 however, that by not following the convention your code may be less
3223 readable by other Python programmers, and it is also conceivable that
3224 a \emph{class browser} program be written which relies upon such a
3225 convention.)
3226
3227
3228 Any function object that is a class attribute defines a method for
3229 instances of that class.  It is not necessary that the function
3230 definition is textually enclosed in the class definition: assigning a
3231 function object to a local variable in the class is also ok.  For
3232 example:
3233
3234 \begin{verbatim}
3235 # Function defined outside the class
3236 def f1(self, x, y):
3237     return min(x, x+y)
3238
3239 class C:
3240     f = f1
3241     def g(self):
3242         return 'hello world'
3243     h = g
3244 \end{verbatim}
3245
3246 Now \code{f}, \code{g} and \code{h} are all attributes of class
3247 \class{C} that refer to function objects, and consequently they are all
3248 methods of instances of \class{C} --- \code{h} being exactly equivalent
3249 to \code{g}.  Note that this practice usually only serves to confuse
3250 the reader of a program.
3251
3252
3253 Methods may call other methods by using method attributes of the
3254 \code{self} argument, e.g.:
3255
3256 \begin{verbatim}
3257 class Bag:
3258     def empty(self):
3259         self.data = []
3260     def add(self, x):
3261         self.data.append(x)
3262     def addtwice(self, x):
3263         self.add(x)
3264         self.add(x)
3265 \end{verbatim}
3266
3267
3268 The instantiation operation (``calling'' a class object) creates an
3269 empty object.  Many classes like to create objects in a known initial
3270 state.  Therefore a class may define a special method named
3271 \method{__init__()}, like this:
3272
3273 \begin{verbatim}
3274     def __init__(self):
3275         self.empty()
3276 \end{verbatim}
3277
3278 When a class defines an \method{__init__()} method, class
3279 instantiation automatically invokes \method{__init__()} for the
3280 newly-created class instance.  So in the \class{Bag} example, a new
3281 and initialized instance can be obtained by:
3282
3283 \begin{verbatim}
3284 x = Bag()
3285 \end{verbatim}
3286
3287 Of course, the \method{__init__()} method may have arguments for
3288 greater flexibility.  In that case, arguments given to the class
3289 instantiation operator are passed on to \method{__init__()}.  For
3290 example,
3291
3292 \begin{verbatim}
3293 >>> class Complex:
3294 ...     def __init__(self, realpart, imagpart):
3295 ...         self.r = realpart
3296 ...         self.i = imagpart
3297 ...
3298 >>> x = Complex(3.0,-4.5)
3299 >>> x.r, x.i
3300 (3.0, -4.5)
3301 \end{verbatim}
3302
3303 Methods may reference global names in the same way as ordinary
3304 functions.  The global scope associated with a method is the module
3305 containing the class definition.  (The class itself is never used as a
3306 global scope!)  While one rarely encounters a good reason for using
3307 global data in a method, there are many legitimate uses of the global
3308 scope: for one thing, functions and modules imported into the global
3309 scope can be used by methods, as well as functions and classes defined
3310 in it.  Usually, the class containing the method is itself defined in
3311 this global scope, and in the next section we'll find some good
3312 reasons why a method would want to reference its own class!
3313
3314
3315 \section{Inheritance \label{inheritance}}
3316
3317 Of course, a language feature would not be worthy of the name ``class''
3318 without supporting inheritance.  The syntax for a derived class
3319 definition looks as follows:
3320
3321 \begin{verbatim}
3322 class DerivedClassName(BaseClassName):
3323     <statement-1>
3324     .
3325     .
3326     .
3327     <statement-N>
3328 \end{verbatim}
3329
3330 The name \class{BaseClassName} must be defined in a scope containing
3331 the derived class definition.  Instead of a base class name, an
3332 expression is also allowed.  This is useful when the base class is
3333 defined in another module, e.g.,
3334
3335 \begin{verbatim}
3336 class DerivedClassName(modname.BaseClassName):
3337 \end{verbatim}
3338
3339 Execution of a derived class definition proceeds the same as for a
3340 base class.  When the class object is constructed, the base class is
3341 remembered.  This is used for resolving attribute references: if a
3342 requested attribute is not found in the class, it is searched in the
3343 base class.  This rule is applied recursively if the base class itself
3344 is derived from some other class.
3345
3346 There's nothing special about instantiation of derived classes:
3347 \code{DerivedClassName()} creates a new instance of the class.  Method
3348 references are resolved as follows: the corresponding class attribute
3349 is searched, descending down the chain of base classes if necessary,
3350 and the method reference is valid if this yields a function object.
3351
3352 Derived classes may override methods of their base classes.  Because
3353 methods have no special privileges when calling other methods of the
3354 same object, a method of a base class that calls another method
3355 defined in the same base class, may in fact end up calling a method of
3356 a derived class that overrides it.  (For \Cpp{} programmers: all methods
3357 in Python are ``virtual functions''.)
3358
3359 An overriding method in a derived class may in fact want to extend
3360 rather than simply replace the base class method of the same name.
3361 There is a simple way to call the base class method directly: just
3362 call \samp{BaseClassName.methodname(self, arguments)}.  This is
3363 occasionally useful to clients as well.  (Note that this only works if
3364 the base class is defined or imported directly in the global scope.)
3365
3366
3367 \subsection{Multiple Inheritance \label{multiple}}
3368
3369 Python supports a limited form of multiple inheritance as well.  A
3370 class definition with multiple base classes looks as follows:
3371
3372 \begin{verbatim}
3373 class DerivedClassName(Base1, Base2, Base3):
3374     <statement-1>
3375     .
3376     .
3377     .
3378     <statement-N>
3379 \end{verbatim}
3380
3381 The only rule necessary to explain the semantics is the resolution
3382 rule used for class attribute references.  This is depth-first,
3383 left-to-right.  Thus, if an attribute is not found in
3384 \class{DerivedClassName}, it is searched in \class{Base1}, then
3385 (recursively) in the base classes of \class{Base1}, and only if it is
3386 not found there, it is searched in \class{Base2}, and so on.
3387
3388 (To some people breadth first --- searching \class{Base2} and
3389 \class{Base3} before the base classes of \class{Base1} --- looks more
3390 natural.  However, this would require you to know whether a particular
3391 attribute of \class{Base1} is actually defined in \class{Base1} or in
3392 one of its base classes before you can figure out the consequences of
3393 a name conflict with an attribute of \class{Base2}.  The depth-first
3394 rule makes no differences between direct and inherited attributes of
3395 \class{Base1}.)
3396
3397 It is clear that indiscriminate use of multiple inheritance is a
3398 maintenance nightmare, given the reliance in Python on conventions to
3399 avoid accidental name conflicts.  A well-known problem with multiple
3400 inheritance is a class derived from two classes that happen to have a
3401 common base class.  While it is easy enough to figure out what happens
3402 in this case (the instance will have a single copy of ``instance
3403 variables'' or data attributes used by the common base class), it is
3404 not clear that these semantics are in any way useful.
3405
3406
3407 \section{Private Variables \label{private}}
3408
3409 There is limited support for class-private
3410 identifiers.  Any identifier of the form \code{__spam} (at least two
3411 leading underscores, at most one trailing underscore) is now textually
3412 replaced with \code{_classname__spam}, where \code{classname} is the
3413 current class name with leading underscore(s) stripped.  This mangling
3414 is done without regard of the syntactic position of the identifier, so
3415 it can be used to define class-private instance and class variables,
3416 methods, as well as globals, and even to store instance variables
3417 private to this class on instances of \emph{other} classes.  Truncation
3418 may occur when the mangled name would be longer than 255 characters.
3419 Outside classes, or when the class name consists of only underscores,
3420 no mangling occurs.
3421
3422 Name mangling is intended to give classes an easy way to define
3423 ``private'' instance variables and methods, without having to worry
3424 about instance variables defined by derived classes, or mucking with
3425 instance variables by code outside the class.  Note that the mangling
3426 rules are designed mostly to avoid accidents; it still is possible for
3427 a determined soul to access or modify a variable that is considered
3428 private.  This can even be useful, e.g. for the debugger, and that's
3429 one reason why this loophole is not closed.  (Buglet: derivation of a
3430 class with the same name as the base class makes use of private
3431 variables of the base class possible.)
3432
3433 Notice that code passed to \code{exec}, \code{eval()} or
3434 \code{evalfile()} does not consider the classname of the invoking
3435 class to be the current class; this is similar to the effect of the
3436 \code{global} statement, the effect of which is likewise restricted to
3437 code that is byte-compiled together.  The same restriction applies to
3438 \code{getattr()}, \code{setattr()} and \code{delattr()}, as well as
3439 when referencing \code{__dict__} directly.
3440
3441 Here's an example of a class that implements its own
3442 \code{__getattr__} and \code{__setattr__} methods and stores all
3443 attributes in a private variable, in a way that works in Python 1.4 as
3444 well as in previous versions:
3445
3446 \begin{verbatim}
3447 class VirtualAttributes:
3448     __vdict = None
3449     __vdict_name = locals().keys()[0]
3450
3451     def __init__(self):
3452         self.__dict__[self.__vdict_name] = {}
3453
3454     def __getattr__(self, name):
3455         return self.__vdict[name]
3456
3457     def __setattr__(self, name, value):
3458         self.__vdict[name] = value
3459 \end{verbatim}
3460
3461 %\emph{Warning: this is an experimental feature.}  To avoid all
3462 %potential problems, refrain from using identifiers starting with
3463 %double underscore except for predefined uses like \code{__init__}.  To
3464 %use private names while maintaining future compatibility: refrain from
3465 %using the same private name in classes related via subclassing; avoid
3466 %explicit (manual) mangling/unmangling; and assume that at some point
3467 %in the future, leading double underscore will revert to being just a
3468 %naming convention.  Discussion on extensive compile-time declarations
3469 %are currently underway, and it is impossible to predict what solution
3470 %will eventually be chosen for private names.  Double leading
3471 %underscore is still a candidate, of course --- just not the only one.
3472 %It is placed in the distribution in the belief that it is useful, and
3473 %so that widespread experience with its use can be gained.  It will not
3474 %be removed without providing a better solution and a migration path.
3475
3476 \section{Odds and Ends \label{odds}}
3477
3478 Sometimes it is useful to have a data type similar to the Pascal
3479 ``record'' or C ``struct'', bundling together a couple of named data
3480 items.  An empty class definition will do nicely, e.g.:
3481
3482 \begin{verbatim}
3483 class Employee:
3484     pass
3485
3486 john = Employee() # Create an empty employee record
3487
3488 # Fill the fields of the record
3489 john.name = 'John Doe'
3490 john.dept = 'computer lab'
3491 john.salary = 1000
3492 \end{verbatim}
3493
3494
3495 A piece of Python code that expects a particular abstract data type
3496 can often be passed a class that emulates the methods of that data
3497 type instead.  For instance, if you have a function that formats some
3498 data from a file object, you can define a class with methods
3499 \method{read()} and \method{readline()} that gets the data from a string
3500 buffer instead, and pass it as an argument.%  (Unfortunately, this
3501 %technique has its limitations: a class can't define operations that
3502 %are accessed by special syntax such as sequence subscripting or
3503 %arithmetic operators, and assigning such a ``pseudo-file'' to
3504 %\code{sys.stdin} will not cause the interpreter to read further input
3505 %from it.)
3506
3507
3508 Instance method objects have attributes, too: \code{m.im_self} is the
3509 object of which the method is an instance, and \code{m.im_func} is the
3510 function object corresponding to the method.
3511
3512 \subsection{Exceptions Can Be Classes \label{exceptionClasses}}
3513
3514 User-defined exceptions are no longer limited to being string objects
3515 --- they can be identified by classes as well.  Using this mechanism it
3516 is possible to create extensible hierarchies of exceptions.
3517
3518 There are two new valid (semantic) forms for the raise statement:
3519
3520 \begin{verbatim}
3521 raise Class, instance
3522
3523 raise instance
3524 \end{verbatim}
3525
3526 In the first form, \code{instance} must be an instance of \class{Class}
3527 or of a class derived from it.  The second form is a shorthand for
3528
3529 \begin{verbatim}
3530 raise instance.__class__, instance
3531 \end{verbatim}
3532
3533 An except clause may list classes as well as string objects.  A class
3534 in an except clause is compatible with an exception if it is the same
3535 class or a base class thereof (but not the other way around --- an
3536 except clause listing a derived class is not compatible with a base
3537 class).  For example, the following code will print B, C, D in that
3538 order:
3539
3540 \begin{verbatim}
3541 class B:
3542     pass
3543 class C(B):
3544     pass
3545 class D(C):
3546     pass
3547
3548 for c in [B, C, D]:
3549     try:
3550         raise c()
3551     except D:
3552         print "D"
3553     except C:
3554         print "C"
3555     except B:
3556         print "B"
3557 \end{verbatim}
3558
3559 Note that if the except clauses were reversed (with
3560 \samp{except B} first), it would have printed B, B, B --- the first
3561 matching except clause is triggered.
3562
3563 When an error message is printed for an unhandled exception which is a
3564 class, the class name is printed, then a colon and a space, and
3565 finally the instance converted to a string using the built-in function
3566 \function{str()}.
3567
3568
3569 \chapter{What Now? \label{whatNow}}
3570
3571 Hopefully reading this tutorial has reinforced your interest in using
3572 Python.  Now what should you do?
3573
3574 You should read, or at least page through, the Library Reference,
3575 which gives complete (though terse) reference material about types,
3576 functions, and modules that can save you a lot of time when writing
3577 Python programs.  The standard Python distribution includes a
3578 \emph{lot} of code in both C and Python; there are modules to read
3579 \UNIX{} mailboxes, retrieve documents via HTTP, generate random
3580 numbers, parse command-line options, write CGI programs, compress
3581 data, and a lot more; skimming through the Library Reference will give
3582 you an idea of what's available.
3583
3584 The major Python Web site is \url{http://www.python.org}; it contains
3585 code, documentation, and pointers to Python-related pages around the
3586 Web.  This web site is mirrored in various places around the
3587 world, such as Europe, Japan, and Australia; a mirror may be faster
3588 than the main site, depending on your geographical location.  A more
3589 informal site is \url{http://starship.skyport.net}, which contains a
3590 bunch of Python-related personal home pages; many people have
3591 downloadable software here.
3592
3593 For Python-related questions and problem reports, you can post to the
3594 newsgroup \newsgroup{comp.lang.python}, or send them to the mailing
3595 list at \email{python-list@cwi.nl}.  The newsgroup and mailing list
3596 are gatewayed, so messages posted to one will automatically be
3597 forwarded to the other.  There are around 35--45 postings a day,
3598 % Postings figure based on average of last six months activity as
3599 % reported by www.findmail.com; Oct. '97 - Mar. '98:  7480 msgs / 182
3600 % days = 41.1 msgs / day.
3601 asking (and answering) questions, suggesting new features, and
3602 announcing new modules.  Before posting, be sure to check the list of
3603 Frequently Asked Questions (also called the FAQ), at
3604 \url{http://www.python.org/doc/FAQ.html}, or look for it in the
3605 \file{Misc/} directory of the Python source distribution.  The FAQ
3606 answers many of the questions that come up again and again, and may
3607 already contain the solution for your problem.
3608
3609 You can support the Python community by joining the Python Software
3610 Activity, which runs the python.org web, ftp and email servers, and
3611 organizes Python workshops.  See \url{http://www.python.org/psa/} for
3612 information on how to join.
3613
3614
3615 \appendix
3616
3617 \chapter{Interactive Input Editing and History Substitution
3618          \label{interacting}}
3619
3620 Some versions of the Python interpreter support editing of the current
3621 input line and history substitution, similar to facilities found in
3622 the Korn shell and the GNU Bash shell.  This is implemented using the
3623 \emph{GNU Readline} library, which supports Emacs-style and vi-style
3624 editing.  This library has its own documentation which I won't
3625 duplicate here; however, the basics are easily explained.  The
3626 interactive editing and history described here are optionally
3627 available in the \UNIX{} and CygWin versions of the interpreter.
3628
3629 This chapter does \emph{not} document the editing facilities of Mark
3630 Hammond's PythonWin package or the Tk-based environment, IDLE,
3631 distributed with Python.  The command line history recall which
3632 operates within DOS boxes on NT and some other DOS and Windows flavors
3633 is yet another beast.
3634
3635 \section{Line Editing \label{lineEditing}}
3636
3637 If supported, input line editing is active whenever the interpreter
3638 prints a primary or secondary prompt.  The current line can be edited
3639 using the conventional Emacs control characters.  The most important
3640 of these are: C-A (Control-A) moves the cursor to the beginning of the
3641 line, C-E to the end, C-B moves it one position to the left, C-F to
3642 the right.  Backspace erases the character to the left of the cursor,
3643 C-D the character to its right.  C-K kills (erases) the rest of the
3644 line to the right of the cursor, C-Y yanks back the last killed
3645 string.  C-underscore undoes the last change you made; it can be
3646 repeated for cumulative effect.
3647
3648 \section{History Substitution \label{history}}
3649
3650 History substitution works as follows.  All non-empty input lines
3651 issued are saved in a history buffer, and when a new prompt is given
3652 you are positioned on a new line at the bottom of this buffer.  C-P
3653 moves one line up (back) in the history buffer, C-N moves one down.
3654 Any line in the history buffer can be edited; an asterisk appears in
3655 front of the prompt to mark a line as modified.  Pressing the Return
3656 key passes the current line to the interpreter.  C-R starts an
3657 incremental reverse search; C-S starts a forward search.
3658
3659 \section{Key Bindings \label{keyBindings}}
3660
3661 The key bindings and some other parameters of the Readline library can
3662 be customized by placing commands in an initialization file called
3663 \file{\$HOME/.inputrc}.  Key bindings have the form
3664
3665 \begin{verbatim}
3666 key-name: function-name
3667 \end{verbatim}
3668
3669 or
3670
3671 \begin{verbatim}
3672 "string": function-name
3673 \end{verbatim}
3674
3675 and options can be set with
3676
3677 \begin{verbatim}
3678 set option-name value
3679 \end{verbatim}
3680
3681 For example:
3682
3683 \begin{verbatim}
3684 # I prefer vi-style editing:
3685 set editing-mode vi
3686 # Edit using a single line:
3687 set horizontal-scroll-mode On
3688 # Rebind some keys:
3689 Meta-h: backward-kill-word
3690 "\C-u": universal-argument
3691 "\C-x\C-r": re-read-init-file
3692 \end{verbatim}
3693
3694 Note that the default binding for TAB in Python is to insert a TAB
3695 instead of Readline's default filename completion function.  If you
3696 insist, you can override this by putting
3697
3698 \begin{verbatim}
3699 TAB: complete
3700 \end{verbatim}
3701
3702 in your \file{\$HOME/.inputrc}.  (Of course, this makes it hard to type
3703 indented continuation lines...)
3704
3705 Automatic completion of variable and module names is optionally
3706 available.  To enable it in the interpreter's interactive mode, add
3707 the following to your \file{\$HOME/.pythonrc.py} file:% $ <- bow to font-lock
3708 \indexii{.pythonrc.py}{file}
3709 \refstmodindex{rlcompleter}
3710 \refbimodindex{readline}
3711
3712 \begin{verbatim}
3713 import rlcompleter, readline
3714 readline.parse_and_bind('tab: complete')
3715 \end{verbatim}
3716
3717 This binds the TAB key to the completion function, so hitting the TAB
3718 key twice suggests completions; it looks at Python statement names,
3719 the current local variables, and the available module names.  For
3720 dotted expressions such as \code{string.a}, it will evaluate the the
3721 expression up to the final \character{.} and then suggest completions
3722 from the attributes of the resulting object.  Note that this may
3723 execute application-defined code if an object with a
3724 \method{__getattr__()} method is part of the expression.
3725
3726
3727 \section{Commentary \label{commentary}}
3728
3729 This facility is an enormous step forward compared to previous
3730 versions of the interpreter; however, some wishes are left: It would
3731 be nice if the proper indentation were suggested on continuation lines
3732 (the parser knows if an indent token is required next).  The
3733 completion mechanism might use the interpreter's symbol table.  A
3734 command to check (or even suggest) matching parentheses, quotes etc.
3735 would also be useful.
3736
3737 % XXX Lele Gaifax's readline module, which adds name completion...
3738
3739 \end{document}