Doc/whatsnew/whatsnew20.tex

   1 \documentclass{howto}
   2
   3 % $Id$
   4
   5 \title{What's New in Python 2.0}
   6 \release{0.05}
   7 \author{A.M. Kuchling and Moshe Zadka}
   8 \authoraddress{\email{amk1@bigfoot.com}, \email{moshez@math.huji.ac.il} }
   9 \begin{document}
  10 \maketitle\tableofcontents
  11
  12 \section{Introduction}
  13
  14 {\large This is a draft document; please report inaccuracies and
  15 omissions to the authors.  This document should not be treated as
  16 definitive; features described here might be removed or changed during
  17 the beta cycle before the final release of Python 2.0.
  18 }
  19
  20 A new release of Python, version 2.0, will be released some time this
  21 autumn.  Beta versions are already available from
  22 \url{http://www.pythonlabs.com/products/python2.0/}.  This article
  23 covers the exciting new features in 2.0, highlights some other useful
  24 changes, and points out a few incompatible changes that may require
  25 rewriting code.
  26
  27 Python's development never completely stops between releases, and a
  28 steady flow of bug fixes and improvements are always being submitted.
  29 A host of minor fixes, a few optimizations, additional docstrings, and
  30 better error messages went into 2.0; to list them all would be
  31 impossible, but they're certainly significant.  Consult the
  32 publicly-available CVS logs if you want to see the full list.
  33
  34 % ======================================================================
  35 \section{What About Python 1.6?}
  36
  37 Python 1.6 can be thought of as the Contractual Obligations Python
  38 release.  After the core development team left CNRI in May 2000, CNRI
  39 requested that a 1.6 release be created, containing all the work on
  40 Python that had been performed at CNRI.  Python 1.6 therefore
  41 represents the state of the CVS tree as of May 2000, with the most
  42 significant new feature being Unicode support.  Development continued
  43 after May, of course, so the 1.6 tree received a few fixes to ensure
  44 that it's forward-compatible with Python 2.0.  1.6 is therefore part
  45 of Python's evolution, and not a side branch.
  46
  47 So, should you take much interest in Python 1.6?  Probably not.  The
  48 1.6final and 2.0beta1 releases were made on the same day (September 5,
  49 2000), the plan being to finalize Python 2.0 within a month or so.  If
  50 you have applications to maintain, there seems little point in
  51 breaking things by moving to 1.6, fixing them, and then having another
  52 round of breakage within a month by moving to 2.0; you're better off
  53 just going straight to 2.0.  Most of the really interesting features
  54 described in this document are only in 2.0, because a lot of work was
  55 done between May and September.
  56
  57 % ======================================================================
  58 \section{New Development Process}
  59
  60 The most important change in Python 2.0 may not be to the code at all,
  61 but to how Python is developed.
  62
  63 In May of 2000, the Python CVS tree was moved to SourceForge.
  64 Previously, there were roughly 7 or so people who had write access to
  65 the CVS tree, and all patches had to be inspected and checked in by
  66 one of the people on this short list.  Obviously, this wasn't very
  67 scalable.  By moving the CVS tree to SourceForge, it became possible
  68 to grant write access to more people; as of September 2000 there were
  69 27 people able to check in changes, a fourfold increase.  This makes
  70 possible large-scale changes that wouldn't be attempted if they'd have
  71 to be filtered through the small group of core developers.  For
  72 example, one day Peter Schneider-Kamp took it into his head to drop
  73 K\&R C compatibility and convert the C source for Python to ANSI
  74 C. After getting approval on the python-dev mailing list, he launched
  75 into a flurry of checkins that lasted about a week, other developers
  76 joined in to help, and the job was done.  If there were only 5 people
  77 with write access, probably that task would have been viewed as
  78 ``nice, but not worth the time and effort needed'' and it would
  79 never have gotten done.
  80
  81 SourceForge also provides tools for tracking bug and patch
  82 submissions, and in combination with the public CVS tree, they've
  83 resulted in a remarkable increase in the speed of development.
  84 Patches now get submitted, commented on, revised by people other than
  85 the original submitter, and bounced back and forth between people
  86 until the patch is deemed worth checking in.  This didn't come without
  87 a cost: developers now have more e-mail to deal with, more mailing
  88 lists to follow, and special tools had to be written for the new
  89 environment.  For example, SourceForge sends default patch and bug
  90 notification e-mail messages that are completely unhelpful, so Ka-Ping
  91 Yee wrote an HTML screen-scraper that sends more useful messages.
  92
  93 The ease of adding code caused a few initial growing pains, such as
  94 code was checked in before it was ready or without getting clear
  95 agreement from the developer group.  The approval process that has
  96 emerged is somewhat similar to that used by the Apache group.
  97 Developers can vote +1, +0, -0, or -1 on a patch; +1 and -1 denote
  98 acceptance or rejection, while +0 and -0 mean the developer is mostly
  99 indifferent to the change, though with a slight positive or negative
 100 slant.  The most significant change from the Apache model is that
 101 Guido van Rossum, who has Benevolent Dictator For Life status, can
 102 ignore the votes of the other developers and approve or reject a
 103 change, effectively giving him a +Infinity / -Infinity vote.
 104
 105 Producing an actual patch is the last step in adding a new feature,
 106 and is usually easy compared to the earlier task of coming up with a
 107 good design.  Discussions of new features can often explode into
 108 lengthy mailing list threads, making the discussion hard to follow,
 109 and no one can read every posting to python-dev.  Therefore, a
 110 relatively formal process has been set up to write Python Enhancement
 111 Proposals (PEPs), modelled on the Internet RFC process.  PEPs are
 112 draft documents that describe a proposed new feature, and are
 113 continually revised until the community reaches a consensus, either
 114 accepting or rejecting the proposal.  Quoting from the introduction to
 115 PEP 1, ``PEP Purpose and Guidelines'':
 116
 117 \begin{quotation}
 118     PEP stands for Python Enhancement Proposal.  A PEP is a design
 119     document providing information to the Python community, or
 120     describing a new feature for Python.  The PEP should provide a
 121     concise technical specification of the feature and a rationale for
 122     the feature.
 123
 124     We intend PEPs to be the primary mechanisms for proposing new
 125     features, for collecting community input on an issue, and for
 126     documenting the design decisions that have gone into Python.  The
 127     PEP author is responsible for building consensus within the
 128     community and documenting dissenting opinions.
 129 \end{quotation}
 130
 131 Read the rest of PEP 1 for the details of the PEP editorial process,
 132 style, and format.  PEPs are kept in the Python CVS tree on
 133 SourceForge, though they're not part of the Python 2.0 distribution,
 134 and are also available in HTML form from
 135 \url{http://python.sourceforge.net/peps/}.  As of September 2000,
 136 there are 25 PEPS, ranging from PEP 201, ``Lockstep Iteration'', to
 137 PEP 225, ``Elementwise/Objectwise Operators''.
 138
 139 To report bugs or submit patches for Python 2.0, use the bug tracking
 140 and patch manager tools available from the SourceForge project page,
 141 at \url{http://sourceforge.net/projects/python/}.
 142
 143 % ======================================================================
 144 \section{Unicode}
 145
 146 The largest new feature in Python 2.0 is a new fundamental data type:
 147 Unicode strings.  Unicode uses 16-bit numbers to represent characters
 148 instead of the 8-bit number used by ASCII, meaning that 65,536
 149 distinct characters can be supported.
 150
 151 The final interface for Unicode support was arrived at through
 152 countless often-stormy discussions on the python-dev mailing list, and
 153 mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string
 154 type implementation by Fredrik Lundh.  A detailed explanation of the
 155 interface is in the file \file{Misc/unicode.txt} in the Python source
 156 distribution; it's also available on the Web at
 157 \url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
 158 This article will simply cover the most significant points from the
 159 full interface.
 160
 161 In Python source code, Unicode strings are written as
 162 \code{u"string"}.  Arbitrary Unicode characters can be written using a
 163 new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a
 164 4-digit hexadecimal number from 0000 to FFFF.  The existing
 165 \code{\e x\var{HHHH}} escape sequence can also be used, and octal
 166 escapes can be used for characters up to U+01FF, which is represented
 167 by \code{\e 777}.
 168
 169 Unicode strings, just like regular strings, are an immutable sequence
 170 type.  They can be indexed and sliced, but not modified in place.
 171 Unicode strings have an \method{encode( \optional{encoding} )} method
 172 that returns an 8-bit string in the desired encoding.  Encodings are
 173 named by strings, such as \code{'ascii'}, \code{'utf-8'},
 174 \code{'iso-8859-1'}, or whatever.  A codec API is defined for
 175 implementing and registering new encodings that are then available
 176 throughout a Python program.  If an encoding isn't specified, the
 177 default encoding is usually 7-bit ASCII, though it can be changed for
 178 your Python installation by calling the
 179 \function{sys.setdefaultencoding(\var{encoding})} function in a
 180 customised version of \file{site.py}.
 181
 182 Combining 8-bit and Unicode strings always coerces to Unicode, using
 183 the default ASCII encoding; the result of \code{'a' + u'bc'} is
 184 \code{u'abc'}.
 185
 186 New built-in functions have been added, and existing built-ins
 187 modified to support Unicode:
 188
 189 \begin{itemize}
 190 \item \code{unichr(\var{ch})} returns a Unicode string 1 character
 191 long, containing the character \var{ch}.
 192
 193 \item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer.
 194
 195 \item \code{unicode(\var{string} \optional{, \var{encoding}}
 196 \optional{, \var{errors}} ) } creates a Unicode string from an 8-bit
 197 string.  \code{encoding} is a string naming the encoding to use.
 198 The \code{errors} parameter specifies the treatment of characters that
 199 are invalid for the current encoding; passing \code{'strict'} as the
 200 value causes an exception to be raised on any encoding error, while
 201 \code{'ignore'} causes errors to be silently ignored and
 202 \code{'replace'} uses U+FFFD, the official replacement character, in
 203 case of any problems.
 204
 205 \item The \keyword{exec} statement, and various built-ins such as
 206 \code{eval()}, \code{getattr()}, and \code{setattr()} will also
 207 accept Unicode strings as well as regular strings.  (It's possible
 208 that the process of fixing this missed some built-ins; if you find a
 209 built-in function that accepts strings but doesn't accept Unicode
 210 strings at all, please report it as a bug.)
 211
 212 \end{itemize}
 213
 214 A new module, \module{unicodedata}, provides an interface to Unicode
 215 character properties.  For example, \code{unicodedata.category(u'A')}
 216 returns the 2-character string 'Lu', the 'L' denoting it's a letter,
 217 and 'u' meaning that it's uppercase.
 218 \code{u.bidirectional(u'\e x0660')} returns 'AN', meaning that U+0660 is
 219 an Arabic number.
 220
 221 The \module{codecs} module contains functions to look up existing encodings
 222 and register new ones.  Unless you want to implement a
 223 new encoding, you'll most often use the
 224 \function{codecs.lookup(\var{encoding})} function, which returns a
 225 4-element tuple: \code{(\var{encode_func},
 226 \var{decode_func}, \var{stream_reader}, \var{stream_writer})}.
 227
 228 \begin{itemize}
 229 \item \var{encode_func} is a function that takes a Unicode string, and
 230 returns a 2-tuple \code{(\var{string}, \var{length})}.  \var{string}
 231 is an 8-bit string containing a portion (perhaps all) of the Unicode
 232 string converted into the given encoding, and \var{length} tells you
 233 how much of the Unicode string was converted.
 234
 235 \item \var{decode_func} is the opposite of \var{encode_func}, taking
 236 an 8-bit string and returning a 2-tuple \code{(\var{ustring},
 237 \var{length})}, consisting of the resulting Unicode string
 238 \var{ustring} and the integer \var{length} telling how much of the
 239 8-bit string was consumed.
 240
 241 \item \var{stream_reader} is a class that supports decoding input from
 242 a stream.  \var{stream_reader(\var{file_obj})} returns an object that
 243 supports the \method{read()}, \method{readline()}, and
 244 \method{readlines()} methods.  These methods will all translate from
 245 the given encoding and return Unicode strings.
 246
 247 \item \var{stream_writer}, similarly, is a class that supports
 248 encoding output to a stream.  \var{stream_writer(\var{file_obj})}
 249 returns an object that supports the \method{write()} and
 250 \method{writelines()} methods.  These methods expect Unicode strings,
 251 translating them to the given encoding on output.
 252 \end{itemize}
 253
 254 For example, the following code writes a Unicode string into a file,
 255 encoding it as UTF-8:
 256
 257 \begin{verbatim}
 258 import codecs
 259
 260 unistr = u'\u0660\u2000ab ...'
 261
 262 (UTF8_encode, UTF8_decode,
 263  UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
 264
 265 output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
 266 output.write( unistr )
 267 output.close()
 268 \end{verbatim}
 269
 270 The following code would then read UTF-8 input from the file:
 271
 272 \begin{verbatim}
 273 input = UTF8_streamreader( open( '/tmp/output', 'rb') )
 274 print repr(input.read())
 275 input.close()
 276 \end{verbatim}
 277
 278 Unicode-aware regular expressions are available through the
 279 \module{re} module, which has a new underlying implementation called
 280 SRE written by Fredrik Lundh of Secret Labs AB.
 281
 282 A \code{-U} command line option was added which causes the Python
 283 compiler to interpret all string literals as Unicode string literals.
 284 This is intended to be used in testing and future-proofing your Python
 285 code, since some future version of Python may drop support for 8-bit
 286 strings and provide only Unicode strings.
 287
 288 % ======================================================================
 289 \section{List Comprehensions}
 290
 291 Lists are a workhorse data type in Python, and many programs
 292 manipulate a list at some point.  Two common operations on lists are
 293 to loop over them, and either pick out the elements that meet a
 294 certain criterion, or apply some function to each element.  For
 295 example, given a list of strings, you might want to pull out all the
 296 strings containing a given substring, or strip off trailing whitespace
 297 from each line.
 298
 299 The existing \function{map()} and \function{filter()} functions can be
 300 used for this purpose, but they require a function as one of their
 301 arguments.  This is fine if there's an existing built-in function that
 302 can be passed directly, but if there isn't, you have to create a
 303 little function to do the required work, and Python's scoping rules
 304 make the result ugly if the little function needs additional
 305 information.  Take the first example in the previous paragraph,
 306 finding all the strings in the list containing a given substring.  You
 307 could write the following to do it:
 308
 309 \begin{verbatim}
 310 # Given the list L, make a list of all strings
 311 # containing the substring S.
 312 sublist = filter( lambda s, substring=S:
 313                      string.find(s, substring) != -1,
 314                   L)
 315 \end{verbatim}
 316
 317 Because of Python's scoping rules, a default argument is used so that
 318 the anonymous function created by the \keyword{lambda} statement knows
 319 what substring is being searched for.  List comprehensions make this
 320 cleaner:
 321
 322 \begin{verbatim}
 323 sublist = [ s for s in L if string.find(s, S) != -1 ]
 324 \end{verbatim}
 325
 326 List comprehensions have the form:
 327
 328 \begin{verbatim}
 329 [ expression for expr in sequence1
 330              for expr2 in sequence2 ...
 331              for exprN in sequenceN
 332              if condition
 333 \end{verbatim}
 334
 335 The \keyword{for}...\keyword{in} clauses contain the sequences to be
 336 iterated over.  The sequences do not have to be the same length,
 337 because they are \emph{not} iterated over in parallel, but
 338 from left to right; this is explained more clearly in the following
 339 paragraphs.  The elements of the generated list will be the successive
 340 values of \var{expression}.  The final \keyword{if} clause is
 341 optional; if present, \var{expression} is only evaluated and added to
 342 the result if \var{condition} is true.
 343
 344 To make the semantics very clear, a list comprehension is equivalent
 345 to the following Python code:
 346
 347 \begin{verbatim}
 348 for expr1 in sequence1:
 349     for expr2 in sequence2:
 350     ...
 351         for exprN in sequenceN:
 352              if (condition):
 353                   # Append the value of
 354                   # the expression to the
 355                   # resulting list.
 356 \end{verbatim}
 357
 358 This means that when there are \keyword{for}...\keyword{in} clauses,
 359 the resulting list will be equal to the product of the lengths of all
 360 the sequences.  If you have two lists of length 3, the output list is
 361 9 elements long:
 362
 363 \begin{verbatim}
 364 seq1 = 'abc'
 365 seq2 = (1,2,3)
 366 >>> [ (x,y) for x in seq1 for y in seq2]
 367 [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1),
 368 ('c', 2), ('c', 3)]
 369 \end{verbatim}
 370
 371 To avoid introducing an ambiguity into Python's grammar, if
 372 \var{expression} is creating a tuple, it must be surrounded with
 373 parentheses.  The first list comprehension below is a syntax error,
 374 while the second one is correct:
 375
 376 \begin{verbatim}
 377 # Syntax error
 378 [ x,y for x in seq1 for y in seq2]
 379 # Correct
 380 [ (x,y) for x in seq1 for y in seq2]
 381 \end{verbatim}
 382
 383 The idea of list comprehensions originally comes from the functional
 384 programming language Haskell (\url{http://www.haskell.org}).  Greg
 385 Ewing argued most effectively for adding them to Python and wrote the
 386 initial list comprehension patch, which was then discussed for a
 387 seemingly endless time on the python-dev mailing list and kept
 388 up-to-date by Skip Montanaro.
 389
 390 % ======================================================================
 391 \section{Augmented Assignment}
 392
 393 Augmented assignment operators, another long-requested feature, have
 394 been added to Python 2.0.  Augmented assignment operators include
 395 \code{+=}, \code{-=}, \code{*=}, and so forth.  For example, the
 396 statement \code{a += 2} increments the value of the variable
 397 \code{a} by 2, equivalent to the slightly lengthier \code{a = a + 2}.
 398
 399 The full list of supported assignment operators is \code{+=},
 400 \code{-=}, \code{*=}, \code{/=}, \code{\%=}, \code{**=}, \code{\&=},
 401 \code{|=}, \verb|^=|, \code{>>=}, and \code{<<=}.  Python classes can
 402 override the augmented assignment operators by defining methods named
 403 \method{__iadd__}, \method{__isub__}, etc.  For example, the following
 404 \class{Number} class stores a number and supports using += to create a
 405 new instance with an incremented value.
 406
 407 \begin{verbatim}
 408 class Number:
 409     def __init__(self, value):
 410         self.value = value
 411     def __iadd__(self, increment):
 412         return Number( self.value + increment)
 413
 414 n = Number(5)
 415 n += 3
 416 print n.value
 417 \end{verbatim}
 418
 419 The \method{__iadd__} special method is called with the value of the
 420 increment, and should return a new instance with an appropriately
 421 modified value; this return value is bound as the new value of the
 422 variable on the left-hand side.
 423
 424 Augmented assignment operators were first introduced in the C
 425 programming language, and most C-derived languages, such as
 426 \program{awk}, C++, Java, Perl, and PHP also support them.  The augmented
 427 assignment patch was implemented by Thomas Wouters.
 428
 429 % ======================================================================
 430 \section{String Methods}
 431
 432 Until now string-manipulation functionality was in the \module{string}
 433 module, which was usually a front-end for the \module{strop}
 434 module written in C.  The addition of Unicode posed a difficulty for
 435 the \module{strop} module, because the functions would all need to be
 436 rewritten in order to accept either 8-bit or Unicode strings.  For
 437 functions such as \function{string.replace()}, which takes 3 string
 438 arguments, that means eight possible permutations, and correspondingly
 439 complicated code.
 440
 441 Instead, Python 2.0 pushes the problem onto the string type, making
 442 string manipulation functionality available through methods on both
 443 8-bit strings and Unicode strings.
 444
 445 \begin{verbatim}
 446 >>> 'andrew'.capitalize()
 447 'Andrew'
 448 >>> 'hostname'.replace('os', 'linux')
 449 'hlinuxtname'
 450 >>> 'moshe'.find('sh')
 451 2
 452 \end{verbatim}
 453
 454 One thing that hasn't changed, a noteworthy April Fools' joke
 455 notwithstanding, is that Python strings are immutable. Thus, the
 456 string methods return new strings, and do not modify the string on
 457 which they operate.
 458
 459 The old \module{string} module is still around for backwards
 460 compatibility, but it mostly acts as a front-end to the new string
 461 methods.
 462
 463 Two methods which have no parallel in pre-2.0 versions, although they
 464 did exist in JPython for quite some time, are \method{startswith()}
 465 and \method{endswith}.  \code{s.startswith(t)} is equivalent to \code{s[:len(t)]
 466 == t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}.
 467
 468 One other method which deserves special mention is \method{join}.  The
 469 \method{join} method of a string receives one parameter, a sequence of
 470 strings, and is equivalent to the \function{string.join} function from
 471 the old \module{string} module, with the arguments reversed. In other
 472 words, \code{s.join(seq)} is equivalent to the old
 473 \code{string.join(seq, s)}.
 474
 475 % ======================================================================
 476 \section{Optional Collection of Cycles}
 477
 478 The C implementation of Python uses reference counting to implement
 479 garbage collection.  Every Python object maintains a count of the
 480 number of references pointing to itself, and adjusts the count as
 481 references are created or destroyed.  Once the reference count reaches
 482 zero, the object is no longer accessible, since you need to have a
 483 reference to an object to access it, and if the count is zero, no
 484 references exist any longer.
 485
 486 Reference counting has some pleasant properties: it's easy to
 487 understand and implement, and the resulting implementation is
 488 portable, fairly fast, and reacts well with other libraries that
 489 implement their own memory handling schemes.  The major problem with
 490 reference counting is that it sometimes doesn't realise that objects
 491 are no longer accessible, resulting in a memory leak.  This happens
 492 when there are cycles of references.
 493
 494 Consider the simplest possible cycle,
 495 a class instance which has a reference to itself:
 496
 497 \begin{verbatim}
 498 instance = SomeClass()
 499 instance.myself = instance
 500 \end{verbatim}
 501
 502 After the above two lines of code have been executed, the reference
 503 count of \code{instance} is 2; one reference is from the variable
 504 named \samp{'instance'}, and the other is from the \samp{myself}
 505 attribute of the instance.
 506
 507 If the next line of code is \code{del instance}, what happens?  The
 508 reference count of \code{instance} is decreased by 1, so it has a
 509 reference count of 1; the reference in the \samp{myself} attribute
 510 still exists.  Yet the instance is no longer accessible through Python
 511 code, and it could be deleted.  Several objects can participate in a
 512 cycle if they have references to each other, causing all of the
 513 objects to be leaked.
 514
 515 An experimental step has been made toward fixing this problem.  When
 516 compiling Python, the \verb|--with-cycle-gc| option can be specified.
 517 This causes a cycle detection algorithm to be periodically executed,
 518 which looks for inaccessible cycles and deletes the objects involved.
 519 A new \module{gc} module provides functions to perform a garbage
 520 collection, obtain debugging statistics, and tuning the collector's parameters.
 521
 522 Why isn't cycle detection enabled by default?  Running the cycle detection
 523 algorithm takes some time, and some tuning will be required to
 524 minimize the overhead cost.  It's not yet obvious how much performance
 525 is lost, because benchmarking this is tricky and depends crucially
 526 on how often the program creates and destroys objects.
 527
 528 Several people tackled this problem and contributed to a solution.  An
 529 early implementation of the cycle detection approach was written by
 530 Toby Kelsey.  The current algorithm was suggested by Eric Tiedemann
 531 during a visit to CNRI, and Guido van Rossum and Neil Schemenauer
 532 wrote two different implementations, which were later integrated by
 533 Neil.  Lots of other people offered suggestions along the way; the
 534 March 2000 archives of the python-dev mailing list contain most of the
 535 relevant discussion, especially in the threads titled ``Reference
 536 cycle collection for Python'' and ``Finalization again''.
 537
 538 % ======================================================================
 539 \section{Other Core Changes}
 540
 541 Various minor changes have been made to Python's syntax and built-in
 542 functions.  None of the changes are very far-reaching, but they're
 543 handy conveniences.
 544
 545 \subsection{Minor Language Changes}
 546
 547 A new syntax makes it more convenient to call a given function
 548 with a tuple of arguments and/or a dictionary of keyword arguments.
 549 In Python 1.5 and earlier, you'd use the \function{apply()}
 550 built-in function: \code{apply(f, \var{args}, \var{kw})} calls the
 551 function \function{f()} with the argument tuple \var{args} and the
 552 keyword arguments in the dictionary \var{kw}.  \function{apply()}
 553 is the same in 2.0, but thanks to a patch from
 554 Greg Ewing, \code{f(*\var{args}, **\var{kw})} as a shorter
 555 and clearer way to achieve the same effect.  This syntax is
 556 symmetrical with the syntax for defining functions:
 557
 558 \begin{verbatim}
 559 def f(*args, **kw):
 560     # args is a tuple of positional args,
 561     # kw is a dictionary of keyword args
 562     ...
 563 \end{verbatim}
 564
 565 The \keyword{print} statement can now have its output directed to a
 566 file-like object by following the \keyword{print} with
 567 \verb|>> file|, similar to the redirection operator in Unix shells.
 568 Previously you'd either have to use the \method{write()} method of the
 569 file-like object, which lacks the convenience and simplicity of
 570 \keyword{print}, or you could assign a new value to
 571 \code{sys.stdout} and then restore the old value.  For sending output to standard error,
 572 it's much easier to write this:
 573
 574 \begin{verbatim}
 575 print >> sys.stderr, "Warning: action field not supplied"
 576 \end{verbatim}
 577
 578 Modules can now be renamed on importing them, using the syntax
 579 \code{import \var{module} as \var{name}} or \code{from \var{module}
 580 import \var{name} as \var{othername}}.  The patch was submitted by
 581 Thomas Wouters.
 582
 583 A new format style is available when using the \code{\%} operator;
 584 '\%r' will insert the \function{repr()} of its argument.  This was
 585 also added from symmetry considerations, this time for symmetry with
 586 the existing '\%s' format style, which inserts the \function{str()} of
 587 its argument.  For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a
 588 string containing \verb|'abc' abc|.
 589
 590 Previously there was no way to implement a class that overrode
 591 Python's built-in \keyword{in} operator and implemented a custom
 592 version.  \code{\var{obj} in \var{seq}} returns true if \var{obj} is
 593 present in the sequence \var{seq}; Python computes this by simply
 594 trying every index of the sequence until either \var{obj} is found or
 595 an \exception{IndexError} is encountered.  Moshe Zadka contributed a
 596 patch which adds a \method{__contains__} magic method for providing a
 597 custom implementation for \keyword{in}. Additionally, new built-in
 598 objects written in C can define what \keyword{in} means for them via a
 599 new slot in the sequence protocol.
 600
 601 Earlier versions of Python used a recursive algorithm for deleting
 602 objects.  Deeply nested data structures could cause the interpreter to
 603 fill up the C stack and crash; Christian Tismer rewrote the deletion
 604 logic to fix this problem.  On a related note, comparing recursive
 605 objects recursed infinitely and crashed; Jeremy Hylton rewrote the
 606 code to no longer crash, producing a useful result instead.  For
 607 example, after this code:
 608
 609 \begin{verbatim}
 610 a = []
 611 b = []
 612 a.append(a)
 613 b.append(b)
 614 \end{verbatim}
 615
 616 The comparison \code{a==b} returns true, because the two recursive
 617 data structures are isomorphic. \footnote{See the thread ``trashcan
 618 and PR\#7'' in the April 2000 archives of the python-dev mailing list
 619 for the discussion leading up to this implementation, and some useful
 620 relevant links.
 621 %http://www.python.org/pipermail/python-dev/2000-April/004834.html
 622 }
 623
 624 Work has been done on porting Python to 64-bit Windows on the Itanium
 625 processor, mostly by Trent Mick of ActiveState.  (Confusingly,
 626 \code{sys.platform} is still \code{'win32'} on Win64 because it seems
 627 that for ease of porting, MS Visual C++ treats code as 32 bit on Itanium.)
 628 PythonWin also supports Windows CE; see the Python CE page at
 629 \url{http://starship.python.net/crew/mhammond/ce/} for more
 630 information.
 631
 632 An attempt has been made to alleviate one of Python's warts, the
 633 often-confusing \exception{NameError} exception when code refers to a
 634 local variable before the variable has been assigned a value.  For
 635 example, the following code raises an exception on the \keyword{print}
 636 statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError}
 637 exception is raised, while 2.0 raises a new
 638 \exception{UnboundLocalError} exception.
 639 \exception{UnboundLocalError} is a subclass of \exception{NameError},
 640 so any existing code that expects \exception{NameError} to be raised
 641 should still work.
 642
 643 \begin{verbatim}
 644 def f():
 645     print "i=",i
 646     i = i + 1
 647 f()
 648 \end{verbatim}
 649
 650 Two new exceptions, \exception{TabError} and
 651 \exception{IndentationError}, have been introduced.  They're both
 652 subclasses of \exception{SyntaxError}, and are raised when Python code
 653 is found to be improperly indented.
 654
 655 \subsection{Changes to Built-in Functions}
 656
 657 A new built-in, \function{zip(\var{seq1}, \var{seq2}, ...)}, has been
 658 added.  \function{zip()} returns a list of tuples where each tuple
 659 contains the i-th element from each of the argument sequences.  The
 660 difference between \function{zip()} and \code{map(None, \var{seq1},
 661 \var{seq2})} is that \function{map()} pads the sequences with
 662 \code{None} if the sequences aren't all of the same length, while
 663 \function{zip()} truncates the returned list to the length of the
 664 shortest argument sequence.
 665
 666 The \function{int()} and \function{long()} functions now accept an
 667 optional ``base'' parameter when the first argument is a string.
 668 \code{int('123', 10)} returns 123, while \code{int('123', 16)} returns
 669 291.  \code{int(123, 16)} raises a \exception{TypeError} exception
 670 with the message ``can't convert non-string with explicit base''.
 671
 672 A new variable holding more detailed version information has been
 673 added to the \module{sys} module.  \code{sys.version_info} is a tuple
 674 \code{(\var{major}, \var{minor}, \var{micro}, \var{level},
 675 \var{serial})} For example, in a hypothetical 2.0.1beta1,
 676 \code{sys.version_info} would be \code{(2, 0, 1, 'beta', 1)}.
 677 \var{level} is a string such as \code{"alpha"}, \code{"beta"}, or
 678 \code{"final"} for a final release.
 679
 680 Dictionaries have an odd new method, \method{setdefault(\var{key},
 681 \var{default})}, which behaves similarly to the existing
 682 \method{get()} method.  However, if the key is missing,
 683 \method{setdefault()} both returns the value of \var{default} as
 684 \method{get()} would do, and also inserts it into the dictionary as
 685 the value for \var{key}.  Thus, the following lines of code:
 686
 687 \begin{verbatim}
 688 if dict.has_key( key ): return dict[key]
 689 else:
 690     dict[key] = []
 691     return dict[key]
 692 \end{verbatim}
 693
 694 can be reduced to a single \code{return dict.setdefault(key, [])} statement.
 695
 696 The interpreter sets a maximum recursion depth in order to catch
 697 runaway recursion before filling the C stack and causing a core dump
 698 or GPF..  Previously this limit was fixed when you compiled Python,
 699 but in 2.0 the maximum recursion depth can be read and modified using
 700 \function{sys.getrecursionlimit} and \function{sys.setrecursionlimit}.
 701 The default value is 1000, and a rough maximum value for a given
 702 platform can be found by running a new script,
 703 \file{Misc/find_recursionlimit.py}.
 704
 705 % ======================================================================
 706 \section{Porting to 2.0}
 707
 708 New Python releases try hard to be compatible with previous releases,
 709 and the record has been pretty good.  However, some changes are
 710 considered useful enough, usually because they fix initial design decisions that
 711 turned out to be actively mistaken, that breaking backward compatibility
 712 can't always be avoided.  This section lists the changes in Python 2.0
 713 that may cause old Python code to break.
 714
 715 The change which will probably break the most code is tightening up
 716 the arguments accepted by some methods.  Some methods would take
 717 multiple arguments and treat them as a tuple, particularly various
 718 list methods such as \method{.append()} and \method{.insert()}.
 719 In earlier versions of Python, if \code{L} is a list, \code{L.append(
 720 1,2 )} appends the tuple \code{(1,2)} to the list.  In Python 2.0 this
 721 causes a \exception{TypeError} exception to be raised, with the
 722 message: 'append requires exactly 1 argument; 2 given'.  The fix is to
 723 simply add an extra set of parentheses to pass both values as a tuple:
 724 \code{L.append( (1,2) )}.
 725
 726 The earlier versions of these methods were more forgiving because they
 727 used an old function in Python's C interface to parse their arguments;
 728 2.0 modernizes them to use \function{PyArg_ParseTuple}, the current
 729 argument parsing function, which provides more helpful error messages
 730 and treats multi-argument calls as errors.  If you absolutely must use
 731 2.0 but can't fix your code, you can edit \file{Objects/listobject.c}
 732 and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to
 733 preserve the old behaviour; this isn't recommended.
 734
 735 Some of the functions in the \module{socket} module are still
 736 forgiving in this way.  For example, \function{socket.connect(
 737 ('hostname', 25) )} is the correct form, passing a tuple representing
 738 an IP address, but \function{socket.connect( 'hostname', 25 )} also
 739 works. \function{socket.connect_ex()} and \function{socket.bind()} are
 740 similarly easy-going.  2.0alpha1 tightened these functions up, but
 741 because the documentation actually used the erroneous multiple
 742 argument form, many people wrote code which would break with the
 743 stricter checking.  GvR backed out the changes in the face of public
 744 reaction, so for the \module{socket} module, the documentation was
 745 fixed and the multiple argument form is simply marked as deprecated;
 746 it \emph{will} be tightened up again in a future Python version.
 747
 748 The \code{\e x} escape in string literals now takes exactly 2 hex
 749 digits.  Previously it would consume all the hex digits following the
 750 'x' and take the lowest 8 bits of the result, so \code{\e x123456} was
 751 equivalent to \code{\e x56}.
 752
 753 The \exception{AttributeError} exception has a more friendly error message,
 754 whose text will be something like \code{'Spam' instance has no attribute 'eggs'}.
 755 Previously the error message was just the missing attribute name \code{eggs}, and
 756 code written to take advantage of this fact will break in 2.0.
 757
 758 Some work has been done to make integers and long integers a bit more
 759 interchangeable.  In 1.5.2, large-file support was added for Solaris,
 760 to allow reading files larger than 2Gb; this made the \method{tell()}
 761 method of file objects return a long integer instead of a regular
 762 integer.  Some code would subtract two file offsets and attempt to use
 763 the result to multiply a sequence or slice a string, but this raised a
 764 \exception{TypeError}.  In 2.0, long integers can be used to multiply
 765 or slice a sequence, and it'll behave as you'd intuitively expect it
 766 to; \code{3L * 'abc'} produces 'abcabcabc', and \code{
 767 (0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in
 768 various contexts where previously only integers were accepted, such
 769 as in the \method{seek()} method of file objects, and in the formats
 770 supported by the \verb|%| operator (\verb|%d|, \verb|%i|, \verb|%x|,
 771 etc.).  For example, \code{"\%d" \% 2L**64} will produce the string
 772 \samp{18446744073709551616}.
 773
 774 The subtlest long integer change of all is that the \function{str()}
 775 of a long integer no longer has a trailing 'L' character, though
 776 \function{repr()} still includes it.  The 'L' annoyed many people who
 777 wanted to print long integers that looked just like regular integers,
 778 since they had to go out of their way to chop off the character.  This
 779 is no longer a problem in 2.0, but code which does \code{str(longval)[:-1]} and assumes the 'L' is there, will now lose
 780 the final digit.
 781
 782 Taking the \function{repr()} of a float now uses a different
 783 formatting precision than \function{str()}.  \function{repr()} uses
 784 \code{\%.17g} format string for C's \function{sprintf()}, while
 785 \function{str()} uses \code{\%.12g} as before.  The effect is that
 786 \function{repr()} may occasionally show more decimal places than
 787 \function{str()}, for certain numbers.
 788 For example, the number 8.1 can't be represented exactly in binary, so
 789 \code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is
 790 \code{'8.1'}.
 791
 792 The \code{-X} command-line option, which turned all standard
 793 exceptions into strings instead of classes, has been removed; the
 794 standard exceptions will now always be classes.  The
 795 \module{exceptions} module containing the standard exceptions was
 796 translated from Python to a built-in C module, written by Barry Warsaw
 797 and Fredrik Lundh.
 798
 799 % Commented out for now -- I don't think anyone will care.
 800 %The pattern and match objects provided by SRE are C types, not Python
 801 %class instances as in 1.5.  This means you can no longer inherit from
 802 %\class{RegexObject} or \class{MatchObject}, but that shouldn't be much
 803 %of a problem since no one should have been doing that in the first
 804 %place.
 805
 806 % ======================================================================
 807 \section{Extending/Embedding Changes}
 808
 809 Some of the changes are under the covers, and will only be apparent to
 810 people writing C extension modules or embedding a Python interpreter
 811 in a larger application.  If you aren't dealing with Python's C API,
 812 you can safely skip this section.
 813
 814 The version number of the Python C API was incremented, so C
 815 extensions compiled for 1.5.2 must be recompiled in order to work with
 816 2.0.  On Windows, attempting to import a third party extension built
 817 for Python 1.5.x usually results in an immediate crash; there's not
 818 much we can do about this.  (Here's Mark Hammond's explanation of the
 819 reasons for the crash.  The 1.5 module is linked against
 820 \file{Python15.dll}.  When \file{Python.exe} , linked against
 821 \file{Python16.dll}, starts up, it initializes the Python data
 822 structures in \file{Python16.dll}.  When Python then imports the
 823 module \file{foo.pyd} linked against \file{Python15.dll}, it
 824 immediately tries to call the functions in that DLL.  As Python has
 825 not been initialized in that DLL, the program immediately crashes.)
 826
 827 Users of Jim Fulton's ExtensionClass module will be pleased to find
 828 out that hooks have been added so that ExtensionClasses are now
 829 supported by \function{isinstance()} and \function{issubclass()}.
 830 This means you no longer have to remember to write code such as
 831 \code{if type(obj) == myExtensionClass}, but can use the more natural
 832 \code{if isinstance(obj, myExtensionClass)}.
 833
 834 The \file{Python/importdl.c} file, which was a mass of \#ifdefs to
 835 support dynamic loading on many different platforms, was cleaned up
 836 and reorganised by Greg Stein.  \file{importdl.c} is now quite small,
 837 and platform-specific code has been moved into a bunch of
 838 \file{Python/dynload_*.c} files.  Another cleanup: there were also a
 839 number of \file{my*.h} files in the Include/ directory that held
 840 various portability hacks; they've been merged into a single file,
 841 \file{Include/pyport.h}.
 842
 843 Vladimir Marangozov's long-awaited malloc restructuring was completed,
 844 to make it easy to have the Python interpreter use a custom allocator
 845 instead of C's standard \function{malloc()}.  For documentation, read
 846 the comments in \file{Include/pymem.h} and
 847 \file{Include/objimpl.h}.  For the lengthy discussions during which
 848 the interface was hammered out, see the Web archives of the 'patches'
 849 and 'python-dev' lists at python.org.
 850
 851 Recent versions of the GUSI development environment for MacOS support
 852 POSIX threads.  Therefore, Python's POSIX threading support now works
 853 on the Macintosh.  Threading support using the user-space GNU \texttt{pth}
 854 library was also contributed.
 855
 856 Threading support on Windows was enhanced, too.  Windows supports
 857 thread locks that use kernel objects only in case of contention; in
 858 the common case when there's no contention, they use simpler functions
 859 which are an order of magnitude faster.  A threaded version of Python
 860 1.5.2 on NT is twice as slow as an unthreaded version; with the 2.0
 861 changes, the difference is only 10\%.  These improvements were
 862 contributed by Yakov Markovitch.
 863
 864 Python 2.0's source now uses only ANSI C prototypes, so compiling Python now
 865 requires an ANSI C compiler, and can no longer be done using a compiler that
 866 only supports K\&R C.
 867
 868 Previously the Python virtual machine used 16-bit numbers in its
 869 bytecode, limiting the size of source files.  In particular, this
 870 affected the maximum size of literal lists and dictionaries in Python
 871 source; occasionally people who are generating Python code would run
 872 into this limit.  A patch by Charles G. Waldman raises the limit from
 873 \verb|2^16| to \verb|2^{32}|.
 874
 875 Three new convenience functions intended for adding constants to a
 876 module's dictionary at module initialization time were added:
 877 \function{PyModule_AddObject()}, \function{PyModule_AddIntConstant()},
 878 and \function{PyModule_AddStringConstant()}.  Each of these functions
 879 takes a module object, a null-terminated C string containing the name
 880 to be added, and a third argument for the value to be assigned to the
 881 name.  This third argument is, respectively, a Python object, a C
 882 long, or a C string.
 883
 884 A wrapper API was added for Unix-style signal handlers.
 885 \function{PyOS_getsig()} gets a signal handler and
 886 \function{PyOS_setsig()} will set a new handler.
 887
 888 % ======================================================================
 889 \section{Distutils: Making Modules Easy to Install}
 890
 891 Before Python 2.0, installing modules was a tedious affair -- there
 892 was no way to figure out automatically where Python is installed, or
 893 what compiler options to use for extension modules.  Software authors
 894 had to go through an arduous ritual of editing Makefiles and
 895 configuration files, which only really work on Unix and leave Windows
 896 and MacOS unsupported.  Python users faced wildly differing
 897 installation instructions which varied between different extension
 898 packages, which made adminstering a Python installation something of a
 899 chore.
 900
 901 The SIG for distribution utilities, shepherded by Greg Ward, has
 902 created the Distutils, a system to make package installation much
 903 easier.  They form the \module{distutils} package, a new part of
 904 Python's standard library. In the best case, installing a Python
 905 module from source will require the same steps: first you simply mean
 906 unpack the tarball or zip archive, and the run ``\code{python setup.py
 907 install}''.  The platform will be automatically detected, the compiler
 908 will be recognized, C extension modules will be compiled, and the
 909 distribution installed into the proper directory.  Optional
 910 command-line arguments provide more control over the installation
 911 process, the distutils package offers many places to override defaults
 912 -- separating the build from the install, building or installing in
 913 non-default directories, and more.
 914
 915 In order to use the Distutils, you need to write a \file{setup.py}
 916 script.  For the simple case, when the software contains only .py
 917 files, a minimal \file{setup.py} can be just a few lines long:
 918
 919 \begin{verbatim}
 920 from distutils.core import setup
 921 setup (name = "foo", version = "1.0",
 922        py_modules = ["module1", "module2"])
 923 \end{verbatim}
 924
 925 The \file{setup.py} file isn't much more complicated if the software
 926 consists of a few packages:
 927
 928 \begin{verbatim}
 929 from distutils.core import setup
 930 setup (name = "foo", version = "1.0",
 931        packages = ["package", "package.subpackage"])
 932 \end{verbatim}
 933
 934 A C extension can be the most complicated case; here's an example taken from
 935 the PyXML package:
 936
 937
 938 \begin{verbatim}
 939 from distutils.core import setup, Extension
 940
 941 expat_extension = Extension('xml.parsers.pyexpat',
 942         define_macros = [('XML_NS', None)],
 943         include_dirs = [ 'extensions/expat/xmltok',
 944                          'extensions/expat/xmlparse' ],
 945         sources = [ 'extensions/pyexpat.c',
 946                     'extensions/expat/xmltok/xmltok.c',
 947                     'extensions/expat/xmltok/xmlrole.c',
 948                   ]
 949        )
 950 setup (name = "PyXML", version = "0.5.4",
 951        ext_modules =[ expat_extension ] )
 952
 953 \end{verbatim}
 954
 955 The Distutils can also take care of creating source and binary
 956 distributions.  The ``sdist'' command, run by ``\code{python setup.py
 957 sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}.
 958 Adding new commands isn't difficult, ``bdist_rpm'' and
 959 ``bdist_wininst'' commands have already been contributed to create an
 960 RPM distribution and a Windows installer for the software,
 961 respectively.  Commands to create other distribution formats such as
 962 Debian packages and Solaris \file{.pkg} files are in various stages of
 963 development.
 964
 965 All this is documented in a new manual, \textit{Distributing Python
 966 Modules}, that joins the basic set of Python documentation.
 967
 968 % ======================================================================
 969 %\section{New XML Code}
 970
 971 %XXX write this section...
 972
 973 % ======================================================================
 974 \section{Module changes}
 975
 976 Lots of improvements and bugfixes were made to Python's extensive
 977 standard library; some of the affected modules include
 978 \module{readline}, \module{ConfigParser}, \module{cgi},
 979 \module{calendar}, \module{posix}, \module{readline}, \module{xmllib},
 980 \module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve},
 981 and \module{nntplib}.  Consult the CVS logs for the exact
 982 patch-by-patch details.
 983
 984 Brian Gallew contributed OpenSSL support for the \module{socket}
 985 module.  OpenSSL is an implementation of the Secure Socket Layer,
 986 which encrypts the data being sent over a socket.  When compiling
 987 Python, you can edit \file{Modules/Setup} to include SSL support,
 988 which adds an additional function to the \module{socket} module:
 989 \function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})},
 990 which takes a socket object and returns an SSL socket.  The
 991 \module{httplib} and \module{urllib} modules were also changed to
 992 support ``https://'' URLs, though no one has implemented FTP or SMTP
 993 over SSL.
 994
 995 The \module{httplib} module has been rewritten by Greg Stein to
 996 support HTTP/1.1.  Backward compatibility with the 1.5 version of
 997 \module{httplib} is provided, though using HTTP/1.1 features such as
 998 pipelining will require rewriting code to use a different set of
 999 interfaces.
1000
1001 The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or
1002 8.3, and support for the older 7.x versions has been dropped.  The
1003 Tkinter module now supports displaying Unicode strings in Tk widgets.
1004 Also, Fredrik Lundh contributed an optimization which makes operations
1005 like \code{create_line} and \code{create_polygon} much faster,
1006 especially when using lots of coordinates.
1007
1008 The \module{curses} module has been greatly extended, starting from
1009 Oliver Andrich's enhanced version, to provide many additional
1010 functions from ncurses and SYSV curses, such as colour, alternative
1011 character set support, pads, and mouse support.  This means the module
1012 is no longer compatible with operating systems that only have BSD
1013 curses, but there don't seem to be any currently maintained OSes that
1014 fall into this category.
1015
1016 As mentioned in the earlier discussion of 2.0's Unicode support, the
1017 underlying implementation of the regular expressions provided by the
1018 \module{re} module has been changed.  SRE, a new regular expression
1019 engine written by Fredrik Lundh and partially funded by Hewlett
1020 Packard, supports matching against both 8-bit strings and Unicode
1021 strings.
1022
1023 % ======================================================================
1024 \section{New modules}
1025
1026 A number of new modules were added.  We'll simply list them with brief
1027 descriptions; consult the 2.0 documentation for the details of a
1028 particular module.
1029
1030 \begin{itemize}
1031
1032 \item{\module{atexit}}:
1033 For registering functions to be called before the Python interpreter exits.
1034 Code that currently sets
1035 \code{sys.exitfunc} directly should be changed to
1036 use the \module{atexit} module instead, importing \module{atexit}
1037 and calling \function{atexit.register()} with
1038 the function to be called on exit.
1039 (Contributed by Skip Montanaro.)
1040
1041 \item{\module{codecs}, \module{encodings}, \module{unicodedata}:}  Added as part of the new Unicode support.
1042
1043 \item{\module{filecmp}:} Supersedes the old \module{cmp}, \module{cmpcache} and
1044 \module{dircmp} modules, which have now become deprecated.
1045 (Contributed by Gordon MacMillan and Moshe Zadka.)
1046
1047 \item{\module{linuxaudiodev}:} Support for the \file{/dev/audio}
1048 device on Linux, a twin to the existing \module{sunaudiodev} module.
1049 (Contributed by Peter Bosch.)
1050
1051 \item{\module{mmap}:} An interface to memory-mapped files on both
1052 Windows and Unix.  A file's contents can be mapped directly into
1053 memory, at which point it behaves like a mutable string, so its
1054 contents can be read and modified.  They can even be passed to
1055 functions that expect ordinary strings, such as the \module{re}
1056 module. (Contributed by Sam Rushing, with some extensions by
1057 A.M. Kuchling.)
1058
1059 \item{\module{pyexpat}:} An interface to the Expat XML parser.
1060 (Contributed by Paul Prescod.)
1061
1062 \item{\module{robotparser}:} Parse a \file{robots.txt} file, which is
1063 used for writing Web spiders that politely avoid certain areas of a
1064 Web site.  The parser accepts the contents of a \file{robots.txt} file,
1065 builds a set of rules from it, and can then answer questions about
1066 the fetchability of a given URL.  (Contributed by Skip Montanaro.)
1067
1068 \item{\module{tabnanny}:} A module/script to
1069 check Python source code for ambiguous indentation.
1070 (Contributed by Tim Peters.)
1071
1072 \item{\module{UserString}:} A base class useful for deriving objects that behave like strings.
1073
1074 \item{\module{webbrowser}:} A module that provides a platform independent
1075 way to launch a web browser on a specific URL. For each platform, various
1076 browsers are tried in a specific order. The user can alter which browser
1077 is launched by setting the \var{BROWSER} environment variable.
1078 (Originally inspired by Eric S. Raymond's patch to \module{urllib}
1079 which added similar functionality, but
1080 the final module comes from code originally
1081 implemented by Fred Drake as \file{Tools/idle/BrowserControl.py},
1082 and adapted for the standard library by Fred.)
1083
1084 \item{\module{_winreg}:} An interface to the
1085 Windows registry.  \module{_winreg} is an adaptation of functions that
1086 have been part of PythonWin since 1995, but has now been added to the core
1087 distribution, and enhanced to support Unicode.
1088 \module{_winreg} was written by Bill Tutt and Mark Hammond.
1089
1090 \item{\module{zipfile}:} A module for reading and writing ZIP-format
1091 archives.  These are archives produced by \program{PKZIP} on
1092 DOS/Windows or \program{zip} on Unix, not to be confused with
1093 \program{gzip}-format files (which are supported by the \module{gzip}
1094 module)
1095 (Contributed by James C. Ahlstrom.)
1096
1097 \item{\module{imputil}:} A module that provides a simpler way for
1098 writing customised import hooks, in comparison to the existing
1099 \module{ihooks} module.  (Implemented by Greg Stein, with much
1100 discussion on python-dev along the way.)
1101
1102 \end{itemize}
1103
1104 % ======================================================================
1105 \section{IDLE Improvements}
1106
1107 IDLE is the official Python cross-platform IDE, written using Tkinter.
1108 Python 2.0 includes IDLE 0.6, which adds a number of new features and
1109 improvements.  A partial list:
1110
1111 \begin{itemize}
1112 \item  UI improvements and optimizations,
1113 especially in the area of syntax highlighting and auto-indentation.
1114
1115 \item The class browser now shows more information, such as the top
1116 level functions in a module.
1117
1118 \item Tab width is now a user settable option. When opening an existing Python
1119 file, IDLE automatically detects the indentation conventions, and adapts.
1120
1121 \item There is now support for calling browsers on various platforms,
1122 used to open the Python documentation in a browser.
1123
1124 \item IDLE now has a command line, which is largely similar to
1125 the vanilla Python interpreter.
1126
1127 \item Call tips were added in many places.
1128
1129 \item IDLE can now be installed as a package.
1130
1131 \item In the editor window, there is now a line/column bar at the bottom.
1132
1133 \item Three new keystroke commands: Check module (Alt-F5), Import
1134 module (F5) and Run script (Ctrl-F5).
1135
1136 \end{itemize}
1137
1138 % ======================================================================
1139 \section{Deleted and Deprecated Modules}
1140
1141 A few modules have been dropped because they're obsolete, or because
1142 there are now better ways to do the same thing.  The \module{stdwin}
1143 module is gone; it was for a platform-independent windowing toolkit
1144 that's no longer developed.
1145
1146 A number of modules have been moved to the
1147 \file{lib-old} subdirectory:
1148 \module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump},
1149 \module{find}, \module{grep}, \module{packmail},
1150 \module{poly}, \module{util}, \module{whatsound}, \module{zmod}.
1151 If you have code which relies on a module  that's been moved to
1152 \file{lib-old}, you can simply add that directory to \code{sys.path}
1153 to get them back, but you're encouraged to update any code that uses
1154 these modules.
1155
1156 \section{Acknowledgements}
1157
1158 The authors would like to thank the following people for offering
1159 suggestions on drafts of this article: Mark Hammond, Gregg Hauser,
1160 Fredrik Lundh, Detlef Lannert, Skip Montanaro, Vladimir Marangozov,
1161 Guido van Rossum, and Neil Schemenauer.
1162
1163 \end{document}