5 \title{What's New in Python
2.3}
7 \author{A.M.\ Kuchling
}
8 \authoraddress{\email{amk@amk.ca
}}
14 This article explains the new features in Python
2.3. Python
2.3 was
15 released on July
29,
2003.
17 The main themes for Python
2.3 are polishing some of the features
18 added in
2.2, adding various small but useful enhancements to the core
19 language, and expanding the standard library. The new object model
20 introduced in the previous version has benefited from
18 months of
21 bugfixes and from optimization efforts that have improved the
22 performance of new-style classes. A few new built-in functions have
23 been added such as
\function{sum()
} and
\function{enumerate()
}. The
24 \keyword{in
} operator can now be used for substring searches (e.g.
25 \code{"ab" in "abc"
} returns
\constant{True
}).
27 Some of the many new library features include Boolean, set, heap, and
28 date/time data types, the ability to import modules from ZIP-format
29 archives, metadata support for the long-awaited Python catalog, an
30 updated version of IDLE, and modules for logging messages, wrapping
31 text, parsing CSV files, processing command-line options, using BerkeleyDB
32 databases... the list of new and enhanced modules is lengthy.
34 This article doesn't attempt to provide a complete specification of
35 the new features, but instead provides a convenient overview. For
36 full details, you should refer to the documentation for Python
2.3,
37 such as the
\citetitle[../lib/lib.html
]{Python Library Reference
} and
38 the
\citetitle[../ref/ref.html
]{Python Reference Manual
}. If you want
39 to understand the complete implementation and design rationale,
40 refer to the PEP for a particular new feature.
43 %======================================================================
44 \section{PEP
218: A Standard Set Datatype
}
46 The new
\module{sets
} module contains an implementation of a set
47 datatype. The
\class{Set
} class is for mutable sets, sets that can
48 have members added and removed. The
\class{ImmutableSet
} class is for
49 sets that can't be modified, and instances of
\class{ImmutableSet
} can
50 therefore be used as dictionary keys. Sets are built on top of
51 dictionaries, so the elements within a set must be hashable.
53 Here's a simple example:
57 >>> S = sets.Set(
[1,
2,
3])
71 The union and intersection of sets can be computed with the
72 \method{union()
} and
\method{intersection()
} methods; an alternative
73 notation uses the bitwise operators
\code{\&
} and
\code{|
}.
74 Mutable sets also have in-place versions of these methods,
75 \method{union_update()
} and
\method{intersection_update()
}.
78 >>> S1 = sets.Set(
[1,
2,
3])
79 >>> S2 = sets.Set(
[4,
5,
6])
81 Set(
[1,
2,
3,
4,
5,
6])
82 >>> S1 | S2 # Alternative notation
83 Set(
[1,
2,
3,
4,
5,
6])
84 >>> S1.intersection(S2)
86 >>> S1 & S2 # Alternative notation
88 >>> S1.union_update(S2)
90 Set(
[1,
2,
3,
4,
5,
6])
94 It's also possible to take the symmetric difference of two sets. This
95 is the set of all elements in the union that aren't in the
96 intersection. Another way of putting it is that the symmetric
97 difference contains all elements that are in exactly one
98 set. Again, there's an alternative notation (
\code{\^
}), and an
99 in-place version with the ungainly name
100 \method{symmetric_difference_update()
}.
103 >>> S1 = sets.Set(
[1,
2,
3,
4])
104 >>> S2 = sets.Set(
[3,
4,
5,
6])
105 >>> S1.symmetric_difference(S2)
112 There are also
\method{issubset()
} and
\method{issuperset()
} methods
113 for checking whether one set is a subset or superset of another:
116 >>> S1 = sets.Set(
[1,
2,
3])
117 >>> S2 = sets.Set(
[2,
3])
122 >>> S1.issuperset(S2)
130 \seepep{218}{Adding a Built-In Set Object Type
}{PEP written by Greg V. Wilson.
131 Implemented by Greg V. Wilson, Alex Martelli, and GvR.
}
137 %======================================================================
138 \section{PEP
255: Simple Generators
\label{section-generators
}}
140 In Python
2.2, generators were added as an optional feature, to be
141 enabled by a
\code{from __future__ import generators
} directive. In
142 2.3 generators no longer need to be specially enabled, and are now
143 always present; this means that
\keyword{yield
} is now always a
144 keyword. The rest of this section is a copy of the description of
145 generators from the ``What's New in Python
2.2''
document; if you read
146 it back when Python
2.2 came out, you can skip the rest of this section.
148 You're doubtless familiar with how function calls work in Python or C.
149 When you call a function, it gets a private namespace where its local
150 variables are created. When the function reaches a
\keyword{return
}
151 statement, the local variables are destroyed and the resulting value
152 is returned to the caller. A later call to the same function will get
153 a fresh new set of local variables. But, what if the local variables
154 weren't thrown away on exiting a function? What if you could later
155 resume the function where it left off? This is what generators
156 provide; they can be thought of as resumable functions.
158 Here's the simplest example of a generator function:
161 def generate_ints(N):
166 A new keyword,
\keyword{yield
}, was introduced for generators. Any
167 function containing a
\keyword{yield
} statement is a generator
168 function; this is detected by Python's bytecode compiler which
169 compiles the function specially as a result.
171 When you call a generator function, it doesn't return a single value;
172 instead it returns a generator object that supports the iterator
173 protocol. On executing the
\keyword{yield
} statement, the generator
174 outputs the value of
\code{i
}, similar to a
\keyword{return
}
175 statement. The big difference between
\keyword{yield
} and a
176 \keyword{return
} statement is that on reaching a
\keyword{yield
} the
177 generator's state of execution is suspended and local variables are
178 preserved. On the next call to the generator's
\code{.next()
} method,
179 the function will resume executing immediately after the
180 \keyword{yield
} statement. (For complicated reasons, the
181 \keyword{yield
} statement isn't allowed inside the
\keyword{try
} block
182 of a
\keyword{try
}...
\keyword{finally
} statement; read
\pep{255} for a full
183 explanation of the interaction between
\keyword{yield
} and
186 Here's a sample usage of the
\function{generate_ints()
} generator:
189 >>> gen = generate_ints(
3)
191 <generator object at
0x8117f90>
199 Traceback (most recent call last):
200 File "stdin", line
1, in ?
201 File "stdin", line
2, in generate_ints
205 You could equally write
\code{for i in generate_ints(
5)
}, or
206 \code{a,b,c = generate_ints(
3)
}.
208 Inside a generator function, the
\keyword{return
} statement can only
209 be used without a value, and signals the end of the procession of
210 values; afterwards the generator cannot return any further values.
211 \keyword{return
} with a value, such as
\code{return
5}, is a syntax
212 error inside a generator function. The end of the generator's results
213 can also be indicated by raising
\exception{StopIteration
} manually,
214 or by just letting the flow of execution fall off the bottom of the
217 You could achieve the effect of generators manually by writing your
218 own class and storing all the local variables of the generator as
219 instance variables. For example, returning a list of integers could
220 be done by setting
\code{self.count
} to
0, and having the
221 \method{next()
} method increment
\code{self.count
} and return it.
222 However, for a moderately complicated generator, writing a
223 corresponding class would be much messier.
224 \file{Lib/test/test_generators.py
} contains a number of more
225 interesting examples. The simplest one implements an in-order
226 traversal of a tree using generators recursively.
229 # A recursive generator that generates Tree leaves in in-order.
232 for x in inorder(t.left):
235 for x in inorder(t.right):
239 Two other examples in
\file{Lib/test/test_generators.py
} produce
240 solutions for the N-Queens problem (placing $N$ queens on an $NxN$
241 chess board so that no queen threatens another) and the Knight's Tour
242 (a route that takes a knight to every square of an $NxN$ chessboard
243 without visiting any square twice).
245 The idea of generators comes from other programming languages,
246 especially Icon (
\url{http://www.cs.arizona.edu/icon/
}), where the
247 idea of generators is central. In Icon, every
248 expression and function call behaves like a generator. One example
249 from ``An Overview of the Icon Programming Language'' at
250 \url{http://www.cs.arizona.edu/icon/docs/ipd266.htm
} gives an idea of
251 what this looks like:
254 sentence := "Store it in the neighboring harbor"
255 if (i := find("or", sentence)) >
5 then write(i)
258 In Icon the
\function{find()
} function returns the indexes at which the
259 substring ``or'' is found:
3,
23,
33. In the
\keyword{if
} statement,
260 \code{i
} is first assigned a value of
3, but
3 is less than
5, so the
261 comparison fails, and Icon retries it with the second value of
23.
23
262 is greater than
5, so the comparison now succeeds, and the code prints
263 the value
23 to the screen.
265 Python doesn't go nearly as far as Icon in adopting generators as a
266 central concept. Generators are considered part of the core
267 Python language, but learning or using them isn't compulsory; if they
268 don't solve any problems that you have, feel free to ignore them.
269 One novel feature of Python's interface as compared to
270 Icon's is that a generator's state is represented as a concrete object
271 (the iterator) that can be passed around to other functions or stored
276 \seepep{255}{Simple Generators
}{Written by Neil Schemenauer, Tim
277 Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
278 and Tim Peters, with other fixes from the Python Labs crew.
}
283 %======================================================================
284 \section{PEP
263: Source Code Encodings
\label{section-encodings
}}
286 Python source files can now be declared as being in different
287 character set encodings. Encodings are declared by including a
288 specially formatted comment in the first or second line of the source
289 file. For example, a UTF-
8 file can be declared with:
292 #!/usr/bin/env python
293 # -*- coding: UTF-
8 -*-
296 Without such an encoding declaration, the default encoding used is
297 7-bit ASCII. Executing or importing modules that contain string
298 literals with
8-bit characters and have no encoding declaration will result
299 in a
\exception{DeprecationWarning
} being signalled by Python
2.3; in
300 2.4 this will be a syntax error.
302 The encoding declaration only affects Unicode string literals, which
303 will be converted to Unicode using the specified encoding. Note that
304 Python identifiers are still restricted to ASCII characters, so you
305 can't have variable names that use characters outside of the usual
310 \seepep{263}{Defining Python Source Code Encodings
}{Written by
311 Marc-Andr\'e Lemburg and Martin von~L\"owis; implemented by Suzuki
312 Hisao and Martin von~L\"owis.
}
317 %======================================================================
318 \section{PEP
273: Importing Modules from Zip Archives
}
320 The new
\module{zipimport
} module adds support for importing
321 modules from a ZIP-format archive. You don't need to import the
322 module explicitly; it will be automatically imported if a ZIP
323 archive's filename is added to
\code{sys.path
}. For example:
326 amk@nyman:~/src/python$ unzip -l /tmp/example.zip
327 Archive: /tmp/example.zip
328 Length Date Time Name
329 -------- ---- ---- ----
330 8467 11-
26-
02 22:
30 jwzthreading.py
333 amk@nyman:~/src/python$ ./python
334 Python
2.3 (
#1, Aug
1 2003,
19:
54:
32)
336 >>> sys.path.insert(
0, '/tmp/example.zip') # Add .zip file to front of path
337 >>> import jwzthreading
338 >>> jwzthreading.__file__
339 '/tmp/example.zip/jwzthreading.py'
343 An entry in
\code{sys.path
} can now be the filename of a ZIP archive.
344 The ZIP archive can contain any kind of files, but only files named
345 \file{*.py
},
\file{*.pyc
}, or
\file{*.pyo
} can be imported. If an
346 archive only contains
\file{*.py
} files, Python will not attempt to
347 modify the archive by adding the corresponding
\file{*.pyc
} file, meaning
348 that if a ZIP archive doesn't contain
\file{*.pyc
} files, importing may be
351 A path within the archive can also be specified to only import from a
352 subdirectory; for example, the path
\file{/tmp/example.zip/lib/
}
353 would only import from the
\file{lib/
} subdirectory within the
358 \seepep{273}{Import Modules from Zip Archives
}{Written by James C. Ahlstrom,
359 who also provided an implementation.
360 Python
2.3 follows the specification in
\pep{273},
361 but uses an implementation written by Just van~Rossum
362 that uses the import hooks described in
\pep{302}.
363 See section~
\ref{section-pep302
} for a description of the new import hooks.
368 %======================================================================
369 \section{PEP
277: Unicode file name support for Windows NT
}
371 On Windows NT,
2000, and XP, the system stores file names as Unicode
372 strings. Traditionally, Python has represented file names as byte
373 strings, which is inadequate because it renders some file names
376 Python now allows using arbitrary Unicode strings (within the
377 limitations of the file system) for all functions that expect file
378 names, most notably the
\function{open()
} built-in function. If a Unicode
379 string is passed to
\function{os.listdir()
}, Python now returns a list
380 of Unicode strings. A new function,
\function{os.getcwdu()
}, returns
381 the current directory as a Unicode string.
383 Byte strings still work as file names, and on Windows Python will
384 transparently convert them to Unicode using the
\code{mbcs
} encoding.
386 Other systems also allow Unicode strings as file names but convert
387 them to byte strings before passing them to the system, which can
388 cause a
\exception{UnicodeError
} to be raised. Applications can test
389 whether arbitrary Unicode strings are supported as file names by
390 checking
\member{os.path.supports_unicode_filenames
}, a Boolean value.
392 Under MacOS,
\function{os.listdir()
} may now return Unicode filenames.
396 \seepep{277}{Unicode file name support for Windows NT
}{Written by Neil
397 Hodgson; implemented by Neil Hodgson, Martin von~L\"owis, and Mark
403 %======================================================================
404 \section{PEP
278: Universal Newline Support
}
406 The three major operating systems used today are Microsoft Windows,
407 Apple's Macintosh OS, and the various
\UNIX\ derivatives. A minor
408 irritation of cross-platform work
409 is that these three platforms all use different characters
410 to mark the ends of lines in text files.
\UNIX\ uses the linefeed
411 (ASCII character
10), MacOS uses the carriage return (ASCII
412 character
13), and Windows uses a two-character sequence of a
413 carriage return plus a newline.
415 Python's file objects can now support end of line conventions other
416 than the one followed by the platform on which Python is running.
417 Opening a file with the mode
\code{'U'
} or
\code{'rU'
} will open a file
418 for reading in universal newline mode. All three line ending
419 conventions will be translated to a
\character{\e n
} in the strings
420 returned by the various file methods such as
\method{read()
} and
423 Universal newline support is also used when importing modules and when
424 executing a file with the
\function{execfile()
} function. This means
425 that Python modules can be shared between all three operating systems
426 without needing to convert the line-endings.
428 This feature can be disabled when compiling Python by specifying
429 the
\longprogramopt{without-universal-newlines
} switch when running Python's
430 \program{configure
} script.
434 \seepep{278}{Universal Newline Support
}{Written
435 and implemented by Jack Jansen.
}
440 %======================================================================
441 \section{PEP
279: enumerate()
\label{section-enumerate
}}
443 A new built-in function,
\function{enumerate()
}, will make
444 certain loops a bit clearer.
\code{enumerate(thing)
}, where
445 \var{thing
} is either an iterator or a sequence, returns a iterator
446 that will return
\code{(
0,
\var{thing
}[0])
},
\code{(
1,
447 \var{thing
}[1])
},
\code{(
2,
\var{thing
}[2])
}, and so forth.
449 A common idiom to change every element of a list looks like this:
452 for i in range(len(L)):
454 # ... compute some result based on item ...
458 This can be rewritten using
\function{enumerate()
} as:
461 for i, item in enumerate(L):
462 # ... compute some result based on item ...
469 \seepep{279}{The enumerate() built-in function
}{Written
470 and implemented by Raymond D. Hettinger.
}
475 %======================================================================
476 \section{PEP
282: The logging Package
}
478 A standard package for writing logs,
\module{logging
}, has been added
479 to Python
2.3. It provides a powerful and flexible mechanism for
480 generating logging output which can then be filtered and processed in
481 various ways. A configuration file written in a standard format can
482 be used to control the logging behavior of a program. Python
483 includes handlers that will write log records to
484 standard error or to a file or socket, send them to the system log, or
485 even e-mail them to a particular address; of course, it's also
486 possible to write your own handler classes.
488 The
\class{Logger
} class is the primary class.
489 Most application code will deal with one or more
\class{Logger
}
490 objects, each one used by a particular subsystem of the application.
491 Each
\class{Logger
} is identified by a name, and names are organized
492 into a hierarchy using
\samp{.
} as the component separator. For
493 example, you might have
\class{Logger
} instances named
\samp{server
},
494 \samp{server.auth
} and
\samp{server.network
}. The latter two
495 instances are below
\samp{server
} in the hierarchy. This means that
496 if you turn up the verbosity for
\samp{server
} or direct
\samp{server
}
497 messages to a different handler, the changes will also apply to
498 records logged to
\samp{server.auth
} and
\samp{server.network
}.
499 There's also a root
\class{Logger
} that's the parent of all other
502 For simple uses, the
\module{logging
} package contains some
503 convenience functions that always use the root log:
508 logging.debug('Debugging information')
509 logging.info('Informational message')
510 logging.warning('Warning:config file
%s not found', 'server.conf')
511 logging.error('Error occurred')
512 logging.critical('Critical error -- shutting down')
515 This produces the following output:
518 WARNING:root:Warning:config file server.conf not found
519 ERROR:root:Error occurred
520 CRITICAL:root:Critical error -- shutting down
523 In the default configuration, informational and debugging messages are
524 suppressed and the output is sent to standard error. You can enable
525 the display of informational and debugging messages by calling the
526 \method{setLevel()
} method on the root logger.
528 Notice the
\function{warning()
} call's use of string formatting
529 operators; all of the functions for logging messages take the
530 arguments
\code{(
\var{msg
},
\var{arg1
},
\var{arg2
}, ...)
} and log the
531 string resulting from
\code{\var{msg
} \% (
\var{arg1
},
\var{arg2
},
534 There's also an
\function{exception()
} function that records the most
535 recent traceback. Any of the other functions will also record the
536 traceback if you specify a true value for the keyword argument
542 except: logging.exception('Problem recorded')
547 This produces the following output:
550 ERROR:root:Problem recorded
551 Traceback (most recent call last):
552 File "t.py", line
6, in f
554 ZeroDivisionError: integer division or modulo by zero
557 Slightly more advanced programs will use a logger other than the root
558 logger. The
\function{getLogger(
\var{name
})
} function is used to get
559 a particular log, creating it if it doesn't exist yet.
560 \function{getLogger(None)
} returns the root logger.
564 log = logging.getLogger('server')
566 log.info('Listening on port
%i', port)
568 log.critical('Disk full')
572 Log records are usually propagated up the hierarchy, so a message
573 logged to
\samp{server.auth
} is also seen by
\samp{server
} and
574 \samp{root
}, but a
\class{Logger
} can prevent this by setting its
575 \member{propagate
} attribute to
\constant{False
}.
577 There are more classes provided by the
\module{logging
} package that
578 can be customized. When a
\class{Logger
} instance is told to log a
579 message, it creates a
\class{LogRecord
} instance that is sent to any
580 number of different
\class{Handler
} instances. Loggers and handlers
581 can also have an attached list of filters, and each filter can cause
582 the
\class{LogRecord
} to be ignored or can modify the record before
583 passing it along. When they're finally output,
\class{LogRecord
}
584 instances are converted to text by a
\class{Formatter
} class. All of
585 these classes can be replaced by your own specially-written classes.
587 With all of these features the
\module{logging
} package should provide
588 enough flexibility for even the most complicated applications. This
589 is only an incomplete overview of its features, so please see the
590 \ulink{package's reference documentation
}{../lib/module-logging.html
}
591 for all of the details. Reading
\pep{282} will also be helpful.
596 \seepep{282}{A Logging System
}{Written by Vinay Sajip and Trent Mick;
597 implemented by Vinay Sajip.
}
602 %======================================================================
603 \section{PEP
285: A Boolean Type
\label{section-bool
}}
605 A Boolean type was added to Python
2.3. Two new constants were added
606 to the
\module{__builtin__
} module,
\constant{True
} and
607 \constant{False
}. (
\constant{True
} and
608 \constant{False
} constants were added to the built-ins
609 in Python
2.2.1, but the
2.2.1 versions are simply set to integer values of
610 1 and
0 and aren't a different type.)
612 The type object for this new type is named
613 \class{bool
}; the constructor for it takes any Python value and
614 converts it to
\constant{True
} or
\constant{False
}.
627 Most of the standard library modules and built-in functions have been
628 changed to return Booleans.
632 >>> hasattr(obj, 'append')
634 >>> isinstance(obj, list)
636 >>> isinstance(obj, tuple)
640 Python's Booleans were added with the primary goal of making code
641 clearer. For example, if you're reading a function and encounter the
642 statement
\code{return
1}, you might wonder whether the
\code{1}
643 represents a Boolean truth value, an index, or a
644 coefficient that multiplies some other quantity. If the statement is
645 \code{return True
}, however, the meaning of the return value is quite
648 Python's Booleans were
\emph{not
} added for the sake of strict
649 type-checking. A very strict language such as Pascal would also
650 prevent you performing arithmetic with Booleans, and would require
651 that the expression in an
\keyword{if
} statement always evaluate to a
652 Boolean result. Python is not this strict and never will be, as
653 \pep{285} explicitly says. This means you can still use any
654 expression in an
\keyword{if
} statement, even ones that evaluate to a
655 list or tuple or some random object. The Boolean type is a
656 subclass of the
\class{int
} class so that arithmetic using a Boolean
670 To sum up
\constant{True
} and
\constant{False
} in a sentence: they're
671 alternative ways to spell the integer values
1 and
0, with the single
672 difference that
\function{str()
} and
\function{repr()
} return the
673 strings
\code{'True'
} and
\code{'False'
} instead of
\code{'
1'
} and
678 \seepep{285}{Adding a bool type
}{Written and implemented by GvR.
}
683 %======================================================================
684 \section{PEP
293: Codec Error Handling Callbacks
}
686 When encoding a Unicode string into a byte string, unencodable
687 characters may be encountered. So far, Python has allowed specifying
688 the error processing as either ``strict'' (raising
689 \exception{UnicodeError
}), ``ignore'' (skipping the character), or
690 ``replace'' (using a question mark in the output string), with
691 ``strict'' being the default behavior. It may be desirable to specify
692 alternative processing of such errors, such as inserting an XML
693 character reference or HTML entity reference into the converted
696 Python now has a flexible framework to add different processing
697 strategies. New error handlers can be added with
698 \function{codecs.register_error
}, and codecs then can access the error
699 handler with
\function{codecs.lookup_error
}. An equivalent C API has
700 been added for codecs written in C. The error handler gets the
701 necessary state information such as the string being converted, the
702 position in the string where the error was detected, and the target
703 encoding. The handler can then either raise an exception or return a
706 Two additional error handlers have been implemented using this
707 framework: ``backslashreplace'' uses Python backslash quoting to
708 represent unencodable characters and ``xmlcharrefreplace'' emits
709 XML character references.
713 \seepep{293}{Codec Error Handling Callbacks
}{Written and implemented by
719 %======================================================================
720 \section{PEP
301: Package Index and Metadata for
721 Distutils
\label{section-pep301
}}
723 Support for the long-requested Python catalog makes its first
726 The heart of the catalog is the new Distutils
\command{register
} command.
727 Running
\code{python setup.py register
} will collect the metadata
728 describing a package, such as its name, version, maintainer,
729 description, \&c., and send it to a central catalog server. The
730 resulting catalog is available from
\url{http://www.python.org/pypi
}.
732 To make the catalog a bit more useful, a new optional
733 \var{classifiers
} keyword argument has been added to the Distutils
734 \function{setup()
} function. A list of
735 \ulink{Trove
}{http://catb.org/
\textasciitilde esr/trove/
}-style
736 strings can be supplied to help classify the software.
738 Here's an example
\file{setup.py
} with classifiers, written to be compatible
739 with older versions of the Distutils:
742 from distutils import core
743 kw =
{'name': "Quixote",
745 'description': "A highly Pythonic Web application framework",
749 if (hasattr(core, 'setup_keywords') and
750 'classifiers' in core.setup_keywords):
751 kw
['classifiers'
] = \
752 ['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
753 'Environment :: No Input/Output (Daemon)',
754 'Intended Audience :: Developers'
],
759 The full list of classifiers can be obtained by running
760 \verb|python setup.py register --list-classifiers|.
764 \seepep{301}{Package Index and Metadata for Distutils}{Written and
765 implemented by Richard Jones.}
770 %======================================================================
771 \section{PEP 302: New Import Hooks \label{section-pep302}}
773 While it's been possible to write custom import hooks ever since the
774 \module{ihooks} module was introduced in Python 1.3, no one has ever
775 been really happy with it because writing new import hooks is
776 difficult and messy. There have been various proposed alternatives
777 such as the \module{imputil} and \module{iu} modules, but none of them
778 has ever gained much acceptance, and none of them were easily usable
781 \pep{302} borrows ideas from its predecessors, especially from
782 Gordon McMillan's \module{iu} module. Three new items
783 are added to the \module{sys} module:
786 \item \code{sys.path_hooks} is a list of callable objects; most
787 often they'll be classes. Each callable takes a string containing a
788 path and either returns an importer object that will handle imports
789 from this path or raises an \exception{ImportError} exception if it
790 can't handle this path.
792 \item \code{sys.path_importer_cache} caches importer objects for
793 each path, so \code{sys.path_hooks} will only need to be traversed
796 \item \code{sys.meta_path} is a list of importer objects that will
797 be traversed before \code{sys.path} is checked. This list is
798 initially empty, but user code can add objects to it. Additional
799 built-in and frozen modules can be imported by an object added to
804 Importer objects must have a single method,
805 \method{find_module(\var{fullname}, \var{path}=None)}. \var{fullname}
806 will be a module or package name, e.g. \samp{string} or
807 \samp{distutils.core}. \method{find_module()} must return a loader object
808 that has a single method, \method{load_module(\var{fullname})}, that
809 creates and returns the corresponding module object.
811 Pseudo-code for Python's new import logic, therefore, looks something
812 like this (simplified a bit; see \pep{302} for the full details):
815 for mp in sys.meta_path:
816 loader = mp(fullname)
817 if loader is not None:
818 <module> = loader.load_module(fullname)
820 for path in sys.path:
821 for hook in sys.path_hooks:
823 importer = hook(path)
825 # ImportError, so try the other path hooks
828 loader = importer.find_module(fullname)
829 <module> = loader.load_module(fullname)
837 \seepep{302}{New Import Hooks}{Written by Just van~Rossum and Paul Moore.
838 Implemented by Just van~Rossum.
844 %======================================================================
845 \section{PEP 305: Comma-separated Files \label{section-pep305}}
847 Comma-separated files are a format frequently used for exporting data
848 from databases and spreadsheets. Python 2.3 adds a parser for
849 comma-separated files.
851 Comma-separated format is deceptively simple at first glance:
857 Read a line and call \code{line.split(',')}: what could be simpler?
858 But toss in string data that can contain commas, and things get more
862 "Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"
865 A big ugly regular expression can parse this, but using the new
866 \module{csv} package is much simpler:
871 input = open('datafile', 'rb')
872 reader = csv.reader(input)
877 The \function{reader} function takes a number of different options.
878 The field separator isn't limited to the comma and can be changed to
879 any character, and so can the quoting and line-ending characters.
881 Different dialects of comma-separated files can be defined and
882 registered; currently there are two dialects, both used by Microsoft Excel.
883 A separate \class{csv.writer} class will generate comma-separated files
884 from a succession of tuples or lists, quoting strings that contain the
889 \seepep{305}{CSV File API}{Written and implemented
890 by Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells.
895 %======================================================================
896 \section{PEP 307: Pickle Enhancements \label{section-pep305}}
898 The \module{pickle} and \module{cPickle} modules received some
899 attention during the 2.3 development cycle. In 2.2, new-style classes
900 could be pickled without difficulty, but they weren't pickled very
901 compactly; \pep{307} quotes a trivial example where a new-style class
902 results in a pickled string three times longer than that for a classic
905 The solution was to invent a new pickle protocol. The
906 \function{pickle.dumps()} function has supported a text-or-binary flag
907 for a long time. In 2.3, this flag is redefined from a Boolean to an
908 integer: 0 is the old text-mode pickle format, 1 is the old binary
909 format, and now 2 is a new 2.3-specific format. A new constant,
910 \constant{pickle.HIGHEST_PROTOCOL}, can be used to select the fanciest
913 Unpickling is no longer considered a safe operation. 2.2's
914 \module{pickle} provided hooks for trying to prevent unsafe classes
915 from being unpickled (specifically, a
916 \member{__safe_for_unpickling__} attribute), but none of this code
917 was ever audited and therefore it's all been ripped out in 2.3. You
918 should not unpickle untrusted data in any version of Python.
920 To reduce the pickling overhead for new-style classes, a new interface
921 for customizing pickling was added using three special methods:
922 \method{__getstate__}, \method{__setstate__}, and
923 \method{__getnewargs__}. Consult \pep{307} for the full semantics
926 As a way to compress pickles yet further, it's now possible to use
927 integer codes instead of long strings to identify pickled classes.
928 The Python Software Foundation will maintain a list of standardized
929 codes; there's also a range of codes for private use. Currently no
930 codes have been specified.
934 \seepep{307}{Extensions to the pickle protocol}{Written and implemented
935 by Guido van Rossum and Tim Peters.}
939 %======================================================================
940 \section{Extended Slices\label{section-slices}}
942 Ever since Python 1.4, the slicing syntax has supported an optional
943 third ``step'' or ``stride'' argument. For example, these are all
944 legal Python syntax: \code{L[1:10:2]}, \code{L[:-1:1]},
945 \code{L[::-1]}. This was added to Python at the request of
946 the developers of Numerical Python, which uses the third argument
947 extensively. However, Python's built-in list, tuple, and string
948 sequence types have never supported this feature, raising a
949 \exception{TypeError} if you tried it. Michael Hudson contributed a
950 patch to fix this shortcoming.
952 For example, you can now easily extract the elements of a list that
961 Negative values also work to make a copy of the same list in reverse
966 [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
969 This also works for tuples, arrays, and strings:
979 If you have a mutable sequence such as a list or an array you can
980 assign to or delete an extended slice, but there are some differences
981 between assignment to extended and regular slices. Assignment to a
982 regular slice can be used to change the length of the sequence:
988 >>> a[1:3] = [4, 5, 6]
993 Extended slices aren't this flexible. When assigning to an extended
994 slice, the list on the right hand side of the statement must contain
995 the same number of items as the slice it is replacing:
1003 >>> a[::2] = [0, -1]
1006 >>> a[::2] = [0,1,2]
1007 Traceback (most recent call last):
1008 File "<stdin>", line 1, in ?
1009 ValueError: attempt to assign sequence of size 3 to extended slice of size 2
1012 Deletion is more straightforward:
1025 One can also now pass slice objects to the
1026 \method{__getitem__} methods of the built-in sequences:
1029 >>> range(10).__getitem__(slice(0, 5, 2))
1033 Or use slice objects directly in subscripts:
1036 >>> range(10)[slice(0, 5, 2)]
1040 To simplify implementing sequences that support extended slicing,
1041 slice objects now have a method \method{indices(\var{length})} which,
1042 given the length of a sequence, returns a \code{(\var{start},
1043 \var{stop}, \var{step})} tuple that can be passed directly to
1045 \method{indices()} handles omitted and out-of-bounds indices in a
1046 manner consistent with regular slices (and this innocuous phrase hides
1047 a welter of confusing details!). The method is intended to be used
1053 def calc_item(self, i):
1055 def __getitem__(self, item):
1056 if isinstance(item, slice):
1057 indices = item.indices(len(self))
1058 return FakeSeq([self.calc_item(i) for i in range(*indices)])
1060 return self.calc_item(i)
1063 From this example you can also see that the built-in \class{slice}
1064 object is now the type object for the slice type, and is no longer a
1065 function. This is consistent with Python 2.2, where \class{int},
1066 \class{str}, etc., underwent the same change.
1069 %======================================================================
1070 \section{Other Language Changes}
1072 Here are all of the changes that Python 2.3 makes to the core Python
1076 \item The \keyword{yield} statement is now always a keyword, as
1077 described in section~\ref{section-generators} of this document.
1079 \item A new built-in function \function{enumerate()}
1080 was added, as described in section~\ref{section-enumerate} of this
1083 \item Two new constants, \constant{True} and \constant{False} were
1084 added along with the built-in \class{bool} type, as described in
1085 section~\ref{section-bool} of this document.
1087 \item The \function{int()} type constructor will now return a long
1088 integer instead of raising an \exception{OverflowError} when a string
1089 or floating-point number is too large to fit into an integer. This
1090 can lead to the paradoxical result that
1091 \code{isinstance(int(\var{expression}), int)} is false, but that seems
1092 unlikely to cause problems in practice.
1094 \item Built-in types now support the extended slicing syntax,
1095 as described in section~\ref{section-slices} of this document.
1097 \item A new built-in function, \function{sum(\var{iterable}, \var{start}=0)},
1098 adds up the numeric items in the iterable object and returns their sum.
1099 \function{sum()} only accepts numbers, meaning that you can't use it
1100 to concatenate a bunch of strings. (Contributed by Alex
1103 \item \code{list.insert(\var{pos}, \var{value})} used to
1104 insert \var{value} at the front of the list when \var{pos} was
1105 negative. The behaviour has now been changed to be consistent with
1106 slice indexing, so when \var{pos} is -1 the value will be inserted
1107 before the last element, and so forth.
1109 \item \code{list.index(\var{value})}, which searches for \var{value}
1110 within the list and returns its index, now takes optional
1111 \var{start} and \var{stop} arguments to limit the search to
1112 only part of the list.
1114 \item Dictionaries have a new method, \method{pop(\var{key}\optional{,
1115 \var{default}})}, that returns the value corresponding to \var{key}
1116 and removes that key/value pair from the dictionary. If the requested
1117 key isn't present in the dictionary, \var{default} is returned if it's
1118 specified and \exception{KeyError} raised if it isn't.
1125 Traceback (most recent call last):
1126 File "stdin", line 1, in ?
1131 Traceback (most recent call last):
1132 File "stdin", line 1, in ?
1133 KeyError: 'pop(): dictionary is empty'
1139 There's also a new class method,
1140 \method{dict.fromkeys(\var{iterable}, \var{value})}, that
1141 creates a dictionary with keys taken from the supplied iterator
1142 \var{iterable} and all values set to \var{value}, defaulting to
1145 (Patches contributed by Raymond Hettinger.)
1147 Also, the \function{dict()} constructor now accepts keyword arguments to
1148 simplify creating small dictionaries:
1151 >>> dict(red=1, blue=2, green=3, black=4)
1152 {'blue': 2, 'black': 4, 'green': 3, 'red': 1}
1155 (Contributed by Just van~Rossum.)
1157 \item The \keyword{assert} statement no longer checks the \code{__debug__}
1158 flag, so you can no longer disable assertions by assigning to \code{__debug__}.
1159 Running Python with the \programopt{-O} switch will still generate
1160 code that doesn't execute any assertions.
1162 \item Most type objects are now callable, so you can use them
1163 to create new objects such as functions, classes, and modules. (This
1164 means that the \module{new} module can be deprecated in a future
1165 Python version, because you can now use the type objects available in
1166 the \module{types} module.)
1167 % XXX should new.py use PendingDeprecationWarning?
1168 For example, you can create a new module object with the following code:
1172 >>> m = types.ModuleType('abc','docstring')
1174 <module 'abc' (built-in)>
1180 A new warning, \exception{PendingDeprecationWarning} was added to
1181 indicate features which are in the process of being
1182 deprecated. The warning will \emph{not} be printed by default. To
1183 check for use of features that will be deprecated in the future,
1184 supply \programopt{-Walways::PendingDeprecationWarning::} on the
1185 command line or use \function{warnings.filterwarnings()}.
1187 \item The process of deprecating string-based exceptions, as
1188 in \code{raise "Error occurred"}, has begun. Raising a string will
1189 now trigger \exception{PendingDeprecationWarning}.
1191 \item Using \code{None} as a variable name will now result in a
1192 \exception{SyntaxWarning} warning. In a future version of Python,
1193 \code{None} may finally become a keyword.
1195 \item The \method{xreadlines()} method of file objects, introduced in
1196 Python 2.1, is no longer necessary because files now behave as their
1197 own iterator. \method{xreadlines()} was originally introduced as a
1198 faster way to loop over all the lines in a file, but now you can
1199 simply write \code{for line in file_obj}. File objects also have a
1200 new read-only \member{encoding} attribute that gives the encoding used
1201 by the file; Unicode strings written to the file will be automatically
1202 converted to bytes using the given encoding.
1204 \item The method resolution order used by new-style classes has
1205 changed, though you'll only notice the difference if you have a really
1206 complicated inheritance hierarchy. Classic classes are unaffected by
1207 this change. Python 2.2 originally used a topological sort of a
1208 class's ancestors, but 2.3 now uses the C3 algorithm as described in
1209 the paper \ulink{``A Monotonic Superclass Linearization for
1210 Dylan''}{http://www.webcom.com/haahr/dylan/linearization-oopsla96.html}.
1211 To understand the motivation for this change,
1212 read Michele Simionato's article
1213 \ulink{``Python 2.3 Method Resolution Order''}
1214 {http://www.python.org/2.3/mro.html}, or
1215 read the thread on python-dev starting with the message at
1216 \url{http://mail.python.org/pipermail/python-dev/2002-October/029035.html}.
1217 Samuele Pedroni first pointed out the problem and also implemented the
1218 fix by coding the C3 algorithm.
1220 \item Python runs multithreaded programs by switching between threads
1221 after executing N bytecodes. The default value for N has been
1222 increased from 10 to 100 bytecodes, speeding up single-threaded
1223 applications by reducing the switching overhead. Some multithreaded
1224 applications may suffer slower response time, but that's easily fixed
1225 by setting the limit back to a lower number using
1226 \function{sys.setcheckinterval(\var{N})}.
1227 The limit can be retrieved with the new
1228 \function{sys.getcheckinterval()} function.
1230 \item One minor but far-reaching change is that the names of extension
1231 types defined by the modules included with Python now contain the
1232 module and a \character{.} in front of the type name. For example, in
1233 Python 2.2, if you created a socket and printed its
1234 \member{__class__}, you'd get this output:
1237 >>> s = socket.socket()
1242 In 2.3, you get this:
1245 <type '_socket.socket'>
1248 \item One of the noted incompatibilities between old- and new-style
1249 classes has been removed: you can now assign to the
1250 \member{__name__} and \member{__bases__} attributes of new-style
1251 classes. There are some restrictions on what can be assigned to
1252 \member{__bases__} along the lines of those relating to assigning to
1253 an instance's \member{__class__} attribute.
1258 %======================================================================
1259 \subsection{String Changes}
1263 \item The \keyword{in} operator now works differently for strings.
1264 Previously, when evaluating \code{\var{X} in \var{Y}} where \var{X}
1265 and \var{Y} are strings, \var{X} could only be a single character.
1266 That's now changed; \var{X} can be a string of any length, and
1267 \code{\var{X} in \var{Y}} will return \constant{True} if \var{X} is a
1268 substring of \var{Y}. If \var{X} is the empty string, the result is
1269 always \constant{True}.
1280 Note that this doesn't tell you where the substring starts; if you
1281 need that information, use the \method{find()} string method.
1283 \item The \method{strip()}, \method{lstrip()}, and \method{rstrip()}
1284 string methods now have an optional argument for specifying the
1285 characters to strip. The default is still to remove all whitespace
1291 >>> '><><abc<><><>'.strip('<>')
1293 >>> '><><abc<><><>\n'.strip('<>')
1295 >>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
1300 (Suggested by Simon Brunning and implemented by Walter D\"orwald.)
1302 \item The \method{startswith()} and \method{endswith()}
1303 string methods now accept negative numbers for the \var{start} and \var{end}
1306 \item Another new string method is \method{zfill()}, originally a
1307 function in the \module{string} module. \method{zfill()} pads a
1308 numeric string with zeros on the left until it's the specified width.
1309 Note that the \code{\%} operator is still more flexible and powerful
1310 than \method{zfill()}.
1315 >>> '12345'.zfill(4)
1317 >>> 'goofy'.zfill(6)
1321 (Contributed by Walter D\"orwald.)
1323 \item A new type object, \class{basestring}, has been added.
1324 Both 8-bit strings and Unicode strings inherit from this type, so
1325 \code{isinstance(obj, basestring)} will return \constant{True} for
1326 either kind of string. It's a completely abstract type, so you
1327 can't create \class{basestring} instances.
1329 \item Interned strings are no longer immortal and will now be
1330 garbage-collected in the usual way when the only reference to them is
1331 from the internal dictionary of interned strings. (Implemented by
1337 %======================================================================
1338 \subsection{Optimizations}
1342 \item The creation of new-style class instances has been made much
1343 faster; they're now faster than classic classes!
1345 \item The \method{sort()} method of list objects has been extensively
1346 rewritten by Tim Peters, and the implementation is significantly
1349 \item Multiplication of large long integers is now much faster thanks
1350 to an implementation of Karatsuba multiplication, an algorithm that
1351 scales better than the O(n*n) required for the grade-school
1352 multiplication algorithm. (Original patch by Christopher A. Craig,
1353 and significantly reworked by Tim Peters.)
1355 \item The \code{SET_LINENO} opcode is now gone. This may provide a
1356 small speed increase, depending on your compiler's idiosyncrasies.
1357 See section~\ref{section-other} for a longer explanation.
1358 (Removed by Michael Hudson.)
1360 \item \function{xrange()} objects now have their own iterator, making
1361 \code{for i in xrange(n)} slightly faster than
1362 \code{for i in range(n)}. (Patch by Raymond Hettinger.)
1364 \item A number of small rearrangements have been made in various
1365 hotspots to improve performance, such as inlining a function or removing
1366 some code. (Implemented mostly by GvR, but lots of people have
1367 contributed single changes.)
1371 The net result of the 2.3 optimizations is that Python 2.3 runs the
1372 pystone benchmark around 25\% faster than Python 2.2.
1375 %======================================================================
1376 \section{New, Improved, and Deprecated Modules}
1378 As usual, Python's standard library received a number of enhancements and
1379 bug fixes. Here's a partial list of the most notable changes, sorted
1380 alphabetically by module name. Consult the
1381 \file{Misc/NEWS} file in the source tree for a more
1382 complete list of changes, or look through the CVS logs for all the
1387 \item The \module{array} module now supports arrays of Unicode
1388 characters using the \character{u} format character. Arrays also now
1389 support using the \code{+=} assignment operator to add another array's
1390 contents, and the \code{*=} assignment operator to repeat an array.
1391 (Contributed by Jason Orendorff.)
1393 \item The \module{bsddb} module has been replaced by version 4.1.6
1394 of the \ulink{PyBSDDB}{http://pybsddb.sourceforge.net} package,
1395 providing a more complete interface to the transactional features of
1396 the BerkeleyDB library.
1398 The old version of the module has been renamed to
1399 \module{bsddb185} and is no longer built automatically; you'll
1400 have to edit \file{Modules/Setup} to enable it. Note that the new
1401 \module{bsddb} package is intended to be compatible with the
1402 old module, so be sure to file bugs if you discover any
1403 incompatibilities. When upgrading to Python 2.3, if the new interpreter is compiled
1404 with a new version of
1405 the underlying BerkeleyDB library, you will almost certainly have to
1406 convert your database files to the new version. You can do this
1407 fairly easily with the new scripts \file{db2pickle.py} and
1408 \file{pickle2db.py} which you will find in the distribution's
1409 \file{Tools/scripts} directory. If you've already been using the PyBSDDB
1410 package and importing it as \module{bsddb3}, you will have to change your
1411 \code{import} statements to import it as \module{bsddb}.
1413 \item The new \module{bz2} module is an interface to the bz2 data
1414 compression library. bz2-compressed data is usually smaller than
1415 corresponding \module{zlib}-compressed data. (Contributed by Gustavo Niemeyer.)
1417 \item A set of standard date/time types has been added in the new \module{datetime}
1418 module. See the following section for more details.
1420 \item The Distutils \class{Extension} class now supports
1421 an extra constructor argument named \var{depends} for listing
1422 additional source files that an extension depends on. This lets
1423 Distutils recompile the module if any of the dependency files are
1424 modified. For example, if \file{sampmodule.c} includes the header
1425 file \file{sample.h}, you would create the \class{Extension} object like
1429 ext = Extension("samp",
1430 sources=["sampmodule.c"],
1431 depends=["sample.h"])
1434 Modifying \file{sample.h} would then cause the module to be recompiled.
1435 (Contributed by Jeremy Hylton.)
1437 \item Other minor changes to Distutils:
1438 it now checks for the \envvar{CC}, \envvar{CFLAGS}, \envvar{CPP},
1439 \envvar{LDFLAGS}, and \envvar{CPPFLAGS} environment variables, using
1440 them to override the settings in Python's configuration (contributed
1443 \item Previously the \module{doctest} module would only search the
1444 docstrings of public methods and functions for test cases, but it now
1445 also examines private ones as well. The \function{DocTestSuite(}
1446 function creates a \class{unittest.TestSuite} object from a set of
1447 \module{doctest} tests.
1449 \item The new \function{gc.get_referents(\var{object})} function returns a
1450 list of all the objects referenced by \var{object}.
1452 \item The \module{getopt} module gained a new function,
1453 \function{gnu_getopt()}, that supports the same arguments as the existing
1454 \function{getopt()} function but uses GNU-style scanning mode.
1455 The existing \function{getopt()} stops processing options as soon as a
1456 non-option argument is encountered, but in GNU-style mode processing
1457 continues, meaning that options and arguments can be mixed. For
1461 >>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1462 ([('-f', 'filename')], ['output', '-v'])
1463 >>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1464 ([('-f', 'filename'), ('-v', '')], ['output'])
1467 (Contributed by Peter \AA{strand}.)
1469 \item The \module{grp}, \module{pwd}, and \module{resource} modules
1470 now return enhanced tuples:
1474 >>> g = grp.getgrnam('amk')
1475 >>> g.gr_name, g.gr_gid
1479 \item The \module{gzip} module can now handle files exceeding 2~Gb.
1481 \item The new \module{heapq} module contains an implementation of a
1482 heap queue algorithm. A heap is an array-like data structure that
1483 keeps items in a partially sorted order such that, for every index
1484 \var{k}, \code{heap[\var{k}] <= heap[2*\var{k}+1]} and
1485 \code{heap[\var{k}] <= heap[2*\var{k}+2]}. This makes it quick to
1486 remove the smallest item, and inserting a new item while maintaining
1487 the heap property is O(lg~n). (See
1488 \url{http://www.nist.gov/dads/HTML/priorityque.html} for more
1489 information about the priority queue data structure.)
1491 The \module{heapq} module provides \function{heappush()} and
1492 \function{heappop()} functions for adding and removing items while
1493 maintaining the heap property on top of some other mutable Python
1494 sequence type. Here's an example that uses a Python list:
1499 >>> for item in [3, 7, 5, 11, 1]:
1500 ... heapq.heappush(heap, item)
1504 >>> heapq.heappop(heap)
1506 >>> heapq.heappop(heap)
1512 (Contributed by Kevin O'Connor.)
1514 \item The IDLE integrated development environment has been updated
1515 using the code from the IDLEfork project
1516 (\url{http://idlefork.sf.net}). The most notable feature is that the
1517 code being developed is now executed in a subprocess, meaning that
1518 there's no longer any need for manual \code{reload()} operations.
1519 IDLE's core code has been incorporated into the standard library as the
1520 \module{idlelib} package.
1522 \item The \module{imaplib} module now supports IMAP over SSL.
1523 (Contributed by Piers Lauder and Tino Lange.)
1525 \item The \module{itertools} contains a number of useful functions for
1526 use with iterators, inspired by various functions provided by the ML
1527 and Haskell languages. For example,
1528 \code{itertools.ifilter(predicate, iterator)} returns all elements in
1529 the iterator for which the function \function{predicate()} returns
1530 \constant{True}, and \code{itertools.repeat(obj, \var{N})} returns
1531 \code{obj} \var{N} times. There are a number of other functions in
1532 the module; see the \ulink{package's reference
1533 documentation}{../lib/module-itertools.html} for details.
1534 (Contributed by Raymond Hettinger.)
1536 \item Two new functions in the \module{math} module,
1537 \function{degrees(\var{rads})} and \function{radians(\var{degs})},
1538 convert between radians and degrees. Other functions in the
1539 \module{math} module such as \function{math.sin()} and
1540 \function{math.cos()} have always required input values measured in
1541 radians. Also, an optional \var{base} argument was added to
1542 \function{math.log()} to make it easier to compute logarithms for
1543 bases other than \code{e} and \code{10}. (Contributed by Raymond
1546 \item Several new POSIX functions (\function{getpgid()}, \function{killpg()},
1547 \function{lchown()}, \function{loadavg()}, \function{major()}, \function{makedev()},
1548 \function{minor()}, and \function{mknod()}) were added to the
1549 \module{posix} module that underlies the \module{os} module.
1550 (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.)
1552 \item In the \module{os} module, the \function{*stat()} family of
1553 functions can now report fractions of a second in a timestamp. Such
1554 time stamps are represented as floats, similar to
1555 the value returned by \function{time.time()}.
1557 During testing, it was found that some applications will break if time
1558 stamps are floats. For compatibility, when using the tuple interface
1559 of the \class{stat_result} time stamps will be represented as integers.
1560 When using named fields (a feature first introduced in Python 2.2),
1561 time stamps are still represented as integers, unless
1562 \function{os.stat_float_times()} is invoked to enable float return
1566 >>> os.stat("/tmp").st_mtime
1568 >>> os.stat_float_times(True)
1569 >>> os.stat("/tmp").st_mtime
1573 In Python 2.4, the default will change to always returning floats.
1575 Application developers should enable this feature only if all their
1576 libraries work properly when confronted with floating point time
1577 stamps, or if they use the tuple API. If used, the feature should be
1578 activated on an application level instead of trying to enable it on a
1581 \item The \module{optparse} module contains a new parser for command-line arguments
1582 that can convert option values to a particular Python type
1583 and will automatically generate a usage message. See the following section for
1586 \item The old and never-documented \module{linuxaudiodev} module has
1587 been deprecated, and a new version named \module{ossaudiodev} has been
1588 added. The module was renamed because the OSS sound drivers can be
1589 used on platforms other than Linux, and the interface has also been
1590 tidied and brought up to date in various ways. (Contributed by Greg
1591 Ward and Nicholas FitzRoy-Dale.)
1593 \item The new \module{platform} module contains a number of functions
1594 that try to determine various properties of the platform you're
1595 running on. There are functions for getting the architecture, CPU
1596 type, the Windows OS version, and even the Linux distribution version.
1597 (Contributed by Marc-Andr\'e Lemburg.)
1599 \item The parser objects provided by the \module{pyexpat} module
1600 can now optionally buffer character data, resulting in fewer calls to
1601 your character data handler and therefore faster performance. Setting
1602 the parser object's \member{buffer_text} attribute to \constant{True}
1603 will enable buffering.
1605 \item The \function{sample(\var{population}, \var{k})} function was
1606 added to the \module{random} module. \var{population} is a sequence or
1607 \class{xrange} object containing the elements of a population, and
1608 \function{sample()} chooses \var{k} elements from the population without
1609 replacing chosen elements. \var{k} can be any value up to
1610 \code{len(\var{population})}. For example:
1613 >>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn']
1614 >>> random.sample(days, 3) # Choose 3 elements
1616 >>> random.sample(days, 7) # Choose 7 elements
1617 ['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']
1618 >>> random.sample(days, 7) # Choose 7 again
1619 ['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']
1620 >>> random.sample(days, 8) # Can't choose eight
1621 Traceback (most recent call last):
1622 File "<stdin>", line 1, in ?
1623 File "random.py", line 414, in sample
1624 raise ValueError, "sample larger than population"
1625 ValueError: sample larger than population
1626 >>> random.sample(xrange(1,10000,2), 10) # Choose ten odd nos. under 10000
1627 [3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
1630 The \module{random} module now uses a new algorithm, the Mersenne
1631 Twister, implemented in C. It's faster and more extensively studied
1632 than the previous algorithm.
1634 (All changes contributed by Raymond Hettinger.)
1636 \item The \module{readline} module also gained a number of new
1637 functions: \function{get_history_item()},
1638 \function{get_current_history_length()}, and \function{redisplay()}.
1640 \item The \module{rexec} and \module{Bastion} modules have been
1641 declared dead, and attempts to import them will fail with a
1642 \exception{RuntimeError}. New-style classes provide new ways to break
1643 out of the restricted execution environment provided by
1644 \module{rexec}, and no one has interest in fixing them or time to do
1645 so. If you have applications using \module{rexec}, rewrite them to
1648 (Sticking with Python 2.2 or 2.1 will not make your applications any
1649 safer because there are known bugs in the \module{rexec} module in
1650 those versions. To repeat: if you're using \module{rexec}, stop using
1653 \item The \module{rotor} module has been deprecated because the
1654 algorithm it uses for encryption is not believed to be secure. If
1655 you need encryption, use one of the several AES Python modules
1656 that are available separately.
1658 \item The \module{shutil} module gained a \function{move(\var{src},
1659 \var{dest})} function that recursively moves a file or directory to a new
1662 \item Support for more advanced POSIX signal handling was added
1663 to the \module{signal} but then removed again as it proved impossible
1664 to make it work reliably across platforms.
1666 \item The \module{socket} module now supports timeouts. You
1667 can call the \method{settimeout(\var{t})} method on a socket object to
1668 set a timeout of \var{t} seconds. Subsequent socket operations that
1669 take longer than \var{t} seconds to complete will abort and raise a
1670 \exception{socket.timeout} exception.
1672 The original timeout implementation was by Tim O'Malley. Michael
1673 Gilfix integrated it into the Python \module{socket} module and
1674 shepherded it through a lengthy review. After the code was checked
1675 in, Guido van~Rossum rewrote parts of it. (This is a good example of
1676 a collaborative development process in action.)
1678 \item On Windows, the \module{socket} module now ships with Secure
1679 Sockets Layer (SSL) support.
1681 \item The value of the C \constant{PYTHON_API_VERSION} macro is now
1682 exposed at the Python level as \code{sys.api_version}. The current
1683 exception can be cleared by calling the new \function{sys.exc_clear()}
1686 \item The new \module{tarfile} module
1687 allows reading from and writing to \program{tar}-format archive files.
1688 (Contributed by Lars Gust\"abel.)
1690 \item The new \module{textwrap} module contains functions for wrapping
1691 strings containing paragraphs of text. The \function{wrap(\var{text},
1692 \var{width})} function takes a string and returns a list containing
1693 the text split into lines of no more than the chosen width. The
1694 \function{fill(\var{text}, \var{width})} function returns a single
1695 string, reformatted to fit into lines no longer than the chosen width.
1696 (As you can guess, \function{fill()} is built on top of
1697 \function{wrap()}. For example:
1701 >>> paragraph = "Not a whit, we defy augury: ... more text ..."
1702 >>> textwrap.wrap(paragraph, 60)
1703 ["Not a whit, we defy augury: there's a special providence in",
1704 "the fall of a sparrow. If it be now, 'tis not to come; if it",
1706 >>> print textwrap.fill(paragraph, 35)
1707 Not a whit, we defy augury: there's
1708 a special providence in the fall of
1709 a sparrow. If it be now, 'tis not
1710 to come; if it be not to come, it
1711 will be now; if it be not now, yet
1712 it will come: the readiness is all.
1716 The module also contains a \class{TextWrapper} class that actually
1717 implements the text wrapping strategy. Both the
1718 \class{TextWrapper} class and the \function{wrap()} and
1719 \function{fill()} functions support a number of additional keyword
1720 arguments for fine-tuning the formatting; consult the \ulink{module's
1721 documentation}{../lib/module-textwrap.html} for details.
1722 (Contributed by Greg Ward.)
1724 \item The \module{thread} and \module{threading} modules now have
1725 companion modules, \module{dummy_thread} and \module{dummy_threading},
1726 that provide a do-nothing implementation of the \module{thread}
1727 module's interface for platforms where threads are not supported. The
1728 intention is to simplify thread-aware modules (ones that \emph{don't}
1729 rely on threads to run) by putting the following code at the top:
1733 import threading as _threading
1735 import dummy_threading as _threading
1738 In this example, \module{_threading} is used as the module name to make
1739 it clear that the module being used is not necessarily the actual
1740 \module{threading} module. Code can call functions and use classes in
1741 \module{_threading} whether or not threads are supported, avoiding an
1742 \keyword{if} statement and making the code slightly clearer. This
1743 module will not magically make multithreaded code run without threads;
1744 code that waits for another thread to return or to do something will
1745 simply hang forever.
1747 \item The \module{time} module's \function{strptime()} function has
1748 long been an annoyance because it uses the platform C library's
1749 \function{strptime()} implementation, and different platforms
1750 sometimes have odd bugs. Brett Cannon contributed a portable
1751 implementation that's written in pure Python and should behave
1752 identically on all platforms.
1754 \item The new \module{timeit} module helps measure how long snippets
1755 of Python code take to execute. The \file{timeit.py} file can be run
1756 directly from the command line, or the module's \class{Timer} class
1757 can be imported and used directly. Here's a short example that
1758 figures out whether it's faster to convert an 8-bit string to Unicode
1759 by appending an empty Unicode string to it or by using the
1760 \function{unicode()} function:
1765 timer1 = timeit.Timer('unicode("abc")')
1766 timer2 = timeit.Timer('"abc" + u""')
1769 print timer1.repeat(repeat=3, number=100000)
1770 print timer2.repeat(repeat=3, number=100000)
1772 # On my laptop this outputs:
1773 # [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]
1774 # [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
1777 \item The \module{Tix} module has received various bug fixes and
1778 updates for the current version of the Tix package.
1780 \item The \module{Tkinter} module now works with a thread-enabled
1781 version of Tcl. Tcl's threading model requires that widgets only be
1782 accessed from the thread in which they're created; accesses from
1783 another thread can cause Tcl to panic. For certain Tcl interfaces,
1784 \module{Tkinter} will now automatically avoid this
1785 when a widget is accessed from a different thread by marshalling a
1786 command, passing it to the correct thread, and waiting for the
1787 results. Other interfaces can't be handled automatically but
1788 \module{Tkinter} will now raise an exception on such an access so that
1789 you can at least find out about the problem. See
1790 \url{http://mail.python.org/pipermail/python-dev/2002-December/031107.html} %
1791 for a more detailed explanation of this change. (Implemented by
1792 Martin von~L\"owis.)
1794 \item Calling Tcl methods through \module{_tkinter} no longer
1795 returns only strings. Instead, if Tcl returns other objects those
1796 objects are converted to their Python equivalent, if one exists, or
1797 wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent
1798 exists. This behavior can be controlled through the
1799 \method{wantobjects()} method of \class{tkapp} objects.
1801 When using \module{_tkinter} through the \module{Tkinter} module (as
1802 most Tkinter applications will), this feature is always activated. It
1803 should not cause compatibility problems, since Tkinter would always
1804 convert string results to Python types where possible.
1806 If any incompatibilities are found, the old behavior can be restored
1807 by setting the \member{wantobjects} variable in the \module{Tkinter}
1808 module to false before creating the first \class{tkapp} object.
1812 Tkinter.wantobjects = 0
1815 Any breakage caused by this change should be reported as a bug.
1817 \item The \module{UserDict} module has a new \class{DictMixin} class which
1818 defines all dictionary methods for classes that already have a minimum
1819 mapping interface. This greatly simplifies writing classes that need
1820 to be substitutable for dictionaries, such as the classes in
1821 the \module{shelve} module.
1823 Adding the mix-in as a superclass provides the full dictionary
1824 interface whenever the class defines \method{__getitem__},
1825 \method{__setitem__}, \method{__delitem__}, and \method{keys}.
1830 >>> class SeqDict(UserDict.DictMixin):
1831 ... """Dictionary lookalike implemented with lists."""
1832 ... def __init__(self):
1833 ... self.keylist = []
1834 ... self.valuelist = []
1835 ... def __getitem__(self, key):
1837 ... i = self.keylist.index(key)
1838 ... except ValueError:
1840 ... return self.valuelist[i]
1841 ... def __setitem__(self, key, value):
1843 ... i = self.keylist.index(key)
1844 ... self.valuelist[i] = value
1845 ... except ValueError:
1846 ... self.keylist.append(key)
1847 ... self.valuelist.append(value)
1848 ... def __delitem__(self, key):
1850 ... i = self.keylist.index(key)
1851 ... except ValueError:
1853 ... self.keylist.pop(i)
1854 ... self.valuelist.pop(i)
1856 ... return list(self.keylist)
1859 >>> dir(s) # See that other dictionary methods are implemented
1860 ['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__',
1861 '__init__', '__iter__', '__len__', '__module__', '__repr__',
1862 '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems',
1863 'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem',
1864 'setdefault', 'update', 'valuelist', 'values']
1867 (Contributed by Raymond Hettinger.)
1869 \item The DOM implementation
1870 in \module{xml.dom.minidom} can now generate XML output in a
1871 particular encoding by providing an optional encoding argument to
1872 the \method{toxml()} and \method{toprettyxml()} methods of DOM nodes.
1874 \item The \module{xmlrpclib} module now supports an XML-RPC extension
1875 for handling nil data values such as Python's \code{None}. Nil values
1876 are always supported on unmarshalling an XML-RPC response. To
1877 generate requests containing \code{None}, you must supply a true value
1878 for the \var{allow_none} parameter when creating a \class{Marshaller}
1881 \item The new \module{DocXMLRPCServer} module allows writing
1882 self-documenting XML-RPC servers. Run it in demo mode (as a program)
1883 to see it in action. Pointing the Web browser to the RPC server
1884 produces pydoc-style documentation; pointing xmlrpclib to the
1885 server allows invoking the actual methods.
1886 (Contributed by Brian Quinlan.)
1888 \item Support for internationalized domain names (RFCs 3454, 3490,
1889 3491, and 3492) has been added. The ``idna'' encoding can be used
1890 to convert between a Unicode domain name and the ASCII-compatible
1891 encoding (ACE) of that name.
1894 >{}>{}> u"www.Alliancefran\c caise.nu".encode("idna")
1895 'www.xn--alliancefranaise-npb.nu'
1898 The \module{socket} module has also been extended to transparently
1899 convert Unicode hostnames to the ACE version before passing them to
1900 the C library. Modules that deal with hostnames such as
1901 \module{httplib} and \module{ftplib}) also support Unicode host names;
1902 \module{httplib} also sends HTTP \samp{Host} headers using the ACE
1903 version of the domain name. \module{urllib} supports Unicode URLs
1904 with non-ASCII host names as long as the \code{path} part of the URL
1907 To implement this change, the \module{stringprep} module, the
1908 \code{mkstringprep} tool and the \code{punycode} encoding have been added.
1913 %======================================================================
1914 \subsection{Date/Time Type}
1916 Date and time types suitable for expressing timestamps were added as
1917 the \module{datetime} module. The types don't support different
1918 calendars or many fancy features, and just stick to the basics of
1921 The three primary types are: \class{date}, representing a day, month,
1922 and year; \class{time}, consisting of hour, minute, and second; and
1923 \class{datetime}, which contains all the attributes of both
1924 \class{date} and \class{time}. There's also a
1925 \class{timedelta} class representing differences between two points
1926 in time, and time zone logic is implemented by classes inheriting from
1927 the abstract \class{tzinfo} class.
1929 You can create instances of \class{date} and \class{time} by either
1930 supplying keyword arguments to the appropriate constructor,
1931 e.g. \code{datetime.date(year=1972, month=10, day=15)}, or by using
1932 one of a number of class methods. For example, the \method{date.today()}
1933 class method returns the current local date.
1935 Once created, instances of the date/time classes are all immutable.
1936 There are a number of methods for producing formatted strings from
1941 >>> now = datetime.datetime.now()
1943 '2002-12-30T21:27:03.994956'
1944 >>> now.ctime() # Only available on date, datetime
1945 'Mon Dec 30 21:27:03 2002'
1946 >>> now.strftime('%Y %d %b')
1950 The \method{replace()} method allows modifying one or more fields
1951 of a \class{date} or \class{datetime} instance, returning a new instance:
1954 >>> d = datetime.datetime.now()
1956 datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
1957 >>> d.replace(year=2001, hour = 12)
1958 datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
1962 Instances can be compared, hashed, and converted to strings (the
1963 result is the same as that of \method{isoformat()}). \class{date} and
1964 \class{datetime} instances can be subtracted from each other, and
1965 added to \class{timedelta} instances. The largest missing feature is
1966 that there's no standard library support for parsing strings and getting back a
1967 \class{date} or \class{datetime}.
1969 For more information, refer to the \ulink{module's reference
1970 documentation}{../lib/module-datetime.html}.
1971 (Contributed by Tim Peters.)
1974 %======================================================================
1975 \subsection{The optparse Module}
1977 The \module{getopt} module provides simple parsing of command-line
1978 arguments. The new \module{optparse} module (originally named Optik)
1979 provides more elaborate command-line parsing that follows the Unix
1980 conventions, automatically creates the output for \longprogramopt{help},
1981 and can perform different actions for different options.
1983 You start by creating an instance of \class{OptionParser} and telling
1984 it what your program's options are.
1988 from optparse import OptionParser
1991 op.add_option('-i', '--input',
1992 action='store', type='string', dest='input',
1993 help='set input filename')
1994 op.add_option('-l', '--length',
1995 action='store', type='int', dest='length',
1996 help='set maximum length of output')
1999 Parsing a command line is then done by calling the \method{parse_args()}
2003 options, args = op.parse_args(sys.argv[1:])
2008 This returns an object containing all of the option values,
2009 and a list of strings containing the remaining arguments.
2011 Invoking the script with the various arguments now works as you'd
2012 expect it to. Note that the length argument is automatically
2013 converted to an integer.
2016 $ ./python opt.py -i data arg1
2017 <Values at 0x400cad4c: {'input': 'data', 'length': None}>
2019 $ ./python opt.py --input=data --length=4
2020 <Values at 0x400cad2c: {'input': 'data', 'length': 4}>
2025 The help message is automatically generated for you:
2028 $ ./python opt.py --help
2029 usage: opt.py [options]
2032 -h, --help show this help message and exit
2033 -iINPUT, --input=INPUT
2035 -lLENGTH, --length=LENGTH
2036 set maximum length of output
2039 % $ prevent Emacs tex-mode from getting confused
2041 See the \ulink{module's documentation}{../lib/module-optparse.html}
2044 Optik was written by Greg Ward, with suggestions from the readers of
2048 %======================================================================
2049 \section{Pymalloc: A Specialized Object Allocator\label{section-pymalloc}}
2051 Pymalloc, a specialized object allocator written by Vladimir
2052 Marangozov, was a feature added to Python 2.1. Pymalloc is intended
2053 to be faster than the system \cfunction{malloc()} and to have less
2054 memory overhead for allocation patterns typical of Python programs.
2055 The allocator uses C's \cfunction{malloc()} function to get large
2056 pools of memory and then fulfills smaller memory requests from these
2059 In 2.1 and 2.2, pymalloc was an experimental feature and wasn't
2060 enabled by default; you had to explicitly enable it when compiling
2061 Python by providing the
2062 \longprogramopt{with-pymalloc} option to the \program{configure}
2063 script. In 2.3, pymalloc has had further enhancements and is now
2064 enabled by default; you'll have to supply
2065 \longprogramopt{without-pymalloc} to disable it.
2067 This change is transparent to code written in Python; however,
2068 pymalloc may expose bugs in C extensions. Authors of C extension
2069 modules should test their code with pymalloc enabled,
2070 because some incorrect code may cause core dumps at runtime.
2072 There's one particularly common error that causes problems. There are
2073 a number of memory allocation functions in Python's C API that have
2074 previously just been aliases for the C library's \cfunction{malloc()}
2075 and \cfunction{free()}, meaning that if you accidentally called
2076 mismatched functions the error wouldn't be noticeable. When the
2077 object allocator is enabled, these functions aren't aliases of
2078 \cfunction{malloc()} and \cfunction{free()} any more, and calling the
2079 wrong function to free memory may get you a core dump. For example,
2080 if memory was allocated using \cfunction{PyObject_Malloc()}, it has to
2081 be freed using \cfunction{PyObject_Free()}, not \cfunction{free()}. A
2082 few modules included with Python fell afoul of this and had to be
2083 fixed; doubtless there are more third-party modules that will have the
2086 As part of this change, the confusing multiple interfaces for
2087 allocating memory have been consolidated down into two API families.
2088 Memory allocated with one family must not be manipulated with
2089 functions from the other family. There is one family for allocating
2090 chunks of memory and another family of functions specifically for
2091 allocating Python objects.
2094 \item To allocate and free an undistinguished chunk of memory use
2095 the ``raw memory'' family: \cfunction{PyMem_Malloc()},
2096 \cfunction{PyMem_Realloc()}, and \cfunction{PyMem_Free()}.
2098 \item The ``object memory'' family is the interface to the pymalloc
2099 facility described above and is biased towards a large number of
2100 ``small'' allocations: \cfunction{PyObject_Malloc},
2101 \cfunction{PyObject_Realloc}, and \cfunction{PyObject_Free}.
2103 \item To allocate and free Python objects, use the ``object'' family
2104 \cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()}, and
2105 \cfunction{PyObject_Del()}.
2108 Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides
2109 debugging features to catch memory overwrites and doubled frees in
2110 both extension modules and in the interpreter itself. To enable this
2111 support, compile a debugging version of the Python interpreter by
2112 running \program{configure} with \longprogramopt{with-pydebug}.
2114 To aid extension writers, a header file \file{Misc/pymemcompat.h} is
2115 distributed with the source to Python 2.3 that allows Python
2116 extensions to use the 2.3 interfaces to memory allocation while
2117 compiling against any version of Python since 1.5.2. You would copy
2118 the file from Python's source distribution and bundle it with the
2119 source of your extension.
2123 \seeurl{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/obmalloc.c}
2124 {For the full details of the pymalloc implementation, see
2125 the comments at the top of the file \file{Objects/obmalloc.c} in the
2126 Python source code. The above link points to the file within the
2127 SourceForge CVS browser.}
2132 % ======================================================================
2133 \section{Build and C API Changes}
2135 Changes to Python's build process and to the C API include:
2139 \item The C-level interface to the garbage collector has been changed
2140 to make it easier to write extension types that support garbage
2141 collection and to debug misuses of the functions.
2142 Various functions have slightly different semantics, so a bunch of
2143 functions had to be renamed. Extensions that use the old API will
2144 still compile but will \emph{not} participate in garbage collection,
2145 so updating them for 2.3 should be considered fairly high priority.
2147 To upgrade an extension module to the new API, perform the following
2152 \item Rename \cfunction{Py_TPFLAGS_GC} to \cfunction{PyTPFLAGS_HAVE_GC}.
2154 \item Use \cfunction{PyObject_GC_New} or \cfunction{PyObject_GC_NewVar} to
2155 allocate objects, and \cfunction{PyObject_GC_Del} to deallocate them.
2157 \item Rename \cfunction{PyObject_GC_Init} to \cfunction{PyObject_GC_Track} and
2158 \cfunction{PyObject_GC_Fini} to \cfunction{PyObject_GC_UnTrack}.
2160 \item Remove \cfunction{PyGC_HEAD_SIZE} from object size calculations.
2162 \item Remove calls to \cfunction{PyObject_AS_GC} and \cfunction{PyObject_FROM_GC}.
2166 \item The cycle detection implementation used by the garbage collection
2167 has proven to be stable, so it's now been made mandatory. You can no
2168 longer compile Python without it, and the
2169 \longprogramopt{with-cycle-gc} switch to \program{configure} has been removed.
2171 \item Python can now optionally be built as a shared library
2172 (\file{libpython2.3.so}) by supplying \longprogramopt{enable-shared}
2173 when running Python's \program{configure} script. (Contributed by Ondrej
2176 \item The \csimplemacro{DL_EXPORT} and \csimplemacro{DL_IMPORT} macros
2177 are now deprecated. Initialization functions for Python extension
2178 modules should now be declared using the new macro
2179 \csimplemacro{PyMODINIT_FUNC}, while the Python core will generally
2180 use the \csimplemacro{PyAPI_FUNC} and \csimplemacro{PyAPI_DATA}
2183 \item The interpreter can be compiled without any docstrings for
2184 the built-in functions and modules by supplying
2185 \longprogramopt{without-doc-strings} to the \program{configure} script.
2186 This makes the Python executable about 10\% smaller, but will also
2187 mean that you can't get help for Python's built-ins. (Contributed by
2190 \item The \cfunction{PyArg_NoArgs()} macro is now deprecated, and code
2191 that uses it should be changed. For Python 2.2 and later, the method
2192 definition table can specify the
2193 \constant{METH_NOARGS} flag, signalling that there are no arguments, and
2194 the argument checking can then be removed. If compatibility with
2195 pre-2.2 versions of Python is important, the code could use
2196 \code{PyArg_ParseTuple(\var{args}, "")} instead, but this will be slower
2197 than using \constant{METH_NOARGS}.
2199 \item A new function, \cfunction{PyObject_DelItemString(\var{mapping},
2200 char *\var{key})} was added as shorthand for
2201 \code{PyObject_DelItem(\var{mapping}, PyString_New(\var{key}))}.
2203 \item File objects now manage their internal string buffer
2204 differently, increasing it exponentially when needed. This results in
2205 the benchmark tests in \file{Lib/test/test_bufio.py} speeding up
2206 considerably (from 57 seconds to 1.7 seconds, according to one
2209 \item It's now possible to define class and static methods for a C
2210 extension type by setting either the \constant{METH_CLASS} or
2211 \constant{METH_STATIC} flags in a method's \ctype{PyMethodDef}
2214 \item Python now includes a copy of the Expat XML parser's source code,
2215 removing any dependence on a system version or local installation of
2218 \item If you dynamically allocate type objects in your extension, you
2219 should be aware of a change in the rules relating to the
2220 \member{__module__} and \member{__name__} attributes. In summary,
2221 you will want to ensure the type's dictionary contains a
2222 \code{'__module__'} key; making the module name the part of the type
2223 name leading up to the final period will no longer have the desired
2224 effect. For more detail, read the API reference documentation or the
2230 %======================================================================
2231 \subsection{Port-Specific Changes}
2233 Support for a port to IBM's OS/2 using the EMX runtime environment was
2234 merged into the main Python source tree. EMX is a POSIX emulation
2235 layer over the OS/2 system APIs. The Python port for EMX tries to
2236 support all the POSIX-like capability exposed by the EMX runtime, and
2237 mostly succeeds; \function{fork()} and \function{fcntl()} are
2238 restricted by the limitations of the underlying emulation layer. The
2239 standard OS/2 port, which uses IBM's Visual Age compiler, also gained
2240 support for case-sensitive import semantics as part of the integration
2241 of the EMX port into CVS. (Contributed by Andrew MacIntyre.)
2243 On MacOS, most toolbox modules have been weaklinked to improve
2244 backward compatibility. This means that modules will no longer fail
2245 to load if a single routine is missing on the curent OS version.
2246 Instead calling the missing routine will raise an exception.
2247 (Contributed by Jack Jansen.)
2249 The RPM spec files, found in the \file{Misc/RPM/} directory in the
2250 Python source distribution, were updated for 2.3. (Contributed by
2251 Sean Reifschneider.)
2253 Other new platforms now supported by Python include AtheOS
2254 (\url{http://www.atheos.cx/}), GNU/Hurd, and OpenVMS.
2257 %======================================================================
2258 \section{Other Changes and Fixes \label{section-other}}
2260 As usual, there were a bunch of other improvements and bugfixes
2261 scattered throughout the source tree. A search through the CVS change
2262 logs finds there were 523 patches applied and 514 bugs fixed between
2263 Python 2.2 and 2.3. Both figures are likely to be underestimates.
2265 Some of the more notable changes are:
2269 \item If the \envvar{PYTHONINSPECT} environment variable is set, the
2270 Python interpreter will enter the interactive prompt after running a
2271 Python program, as if Python had been invoked with the \programopt{-i}
2272 option. The environment variable can be set before running the Python
2273 interpreter, or it can be set by the Python program as part of its
2276 \item The \file{regrtest.py} script now provides a way to allow ``all
2277 resources except \var{foo}.'' A resource name passed to the
2278 \programopt{-u} option can now be prefixed with a hyphen
2279 (\character{-}) to mean ``remove this resource.'' For example, the
2280 option `\code{\programopt{-u}all,-bsddb}' could be used to enable the
2281 use of all resources except \code{bsddb}.
2283 \item The tools used to build the documentation now work under Cygwin
2286 \item The \code{SET_LINENO} opcode has been removed. Back in the
2287 mists of time, this opcode was needed to produce line numbers in
2288 tracebacks and support trace functions (for, e.g., \module{pdb}).
2289 Since Python 1.5, the line numbers in tracebacks have been computed
2290 using a different mechanism that works with ``python -O''. For Python
2291 2.3 Michael Hudson implemented a similar scheme to determine when to
2292 call the trace function, removing the need for \code{SET_LINENO}
2295 It would be difficult to detect any resulting difference from Python
2296 code, apart from a slight speed up when Python is run without
2299 C extensions that access the \member{f_lineno} field of frame objects
2300 should instead call \code{PyCode_Addr2Line(f->f_code, f->f_lasti)}.
2301 This will have the added effect of making the code work as desired
2302 under ``python -O'' in earlier versions of Python.
2304 A nifty new feature is that trace functions can now assign to the
2305 \member{f_lineno} attribute of frame objects, changing the line that
2306 will be executed next. A \samp{jump} command has been added to the
2307 \module{pdb} debugger taking advantage of this new feature.
2308 (Implemented by Richie Hindle.)
2313 %======================================================================
2314 \section{Porting to Python 2.3}
2316 This section lists previously described changes that may require
2317 changes to your code:
2321 \item \keyword{yield} is now always a keyword; if it's used as a
2322 variable name in your code, a different name must be chosen.
2324 \item For strings \var{X} and \var{Y}, \code{\var{X} in \var{Y}} now works
2325 if \var{X} is more than one character long.
2327 \item The \function{int()} type constructor will now return a long
2328 integer instead of raising an \exception{OverflowError} when a string
2329 or floating-point number is too large to fit into an integer.
2331 \item If you have Unicode strings that contain 8-bit characters, you
2332 must declare the file's encoding (UTF-8, Latin-1, or whatever) by
2333 adding a comment to the top of the file. See
2334 section~\ref{section-encodings} for more information.
2336 \item Calling Tcl methods through \module{_tkinter} no longer
2337 returns only strings. Instead, if Tcl returns other objects those
2338 objects are converted to their Python equivalent, if one exists, or
2339 wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent
2342 \item Large octal and hex literals such as
2343 \code{0xffffffff} now trigger a \exception{FutureWarning}. Currently
2344 they're stored as 32-bit numbers and result in a negative value, but
2345 in Python 2.4 they'll become positive long integers.
2347 There are a few ways to fix this warning. If you really need a
2348 positive number, just add an \samp{L} to the end of the literal. If
2349 you're trying to get a 32-bit integer with low bits set and have
2350 previously used an expression such as \code{~(1 << 31)}, it's probably
2351 clearest to start with all bits set and clear the desired upper bits.
2352 For example, to clear just the top bit (bit 31), you could write
2353 \code{0xffffffffL {\&}{\textasciitilde}(1L<<31)}.
2355 \item You can no longer disable assertions by assigning to \code{__debug__}.
2357 \item The Distutils \function{setup()} function has gained various new
2358 keyword arguments such as \var{depends}. Old versions of the
2359 Distutils will abort if passed unknown keywords. A solution is to check
2360 for the presence of the new \function{get_distutil_options()} function
2361 in your \file{setup.py} and only uses the new keywords
2362 with a version of the Distutils that supports them:
2365 from distutils import core
2367 kw = {'sources': 'foo.c', ...}
2368 if hasattr(core, 'get_distutil_options'):
2369 kw['depends'] = ['foo.h']
2370 ext = Extension(**kw)
2373 \item Using \code{None} as a variable name will now result in a
2374 \exception{SyntaxWarning} warning.
2376 \item Names of extension types defined by the modules included with
2377 Python now contain the module and a \character{.} in front of the type
2383 %======================================================================
2384 \section{Acknowledgements \label{acks}}
2386 The author would like to thank the following people for offering
2387 suggestions, corrections and assistance with various drafts of this
2388 article: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside,
2389 Andrew Dalke, Scott David Daniels, Fred~L. Drake, Jr., David Fraser,
2391 Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert,
2392 Martin von~L\"owis, Andrew MacIntyre, Lalo Martins, Chad Netzer,
2393 Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco
2394 Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler,