5 \title{What's New in Python
2.0}
7 \author{A.M. Kuchling and Moshe Zadka
}
8 \authoraddress{\email{amk1@bigfoot.com
},
\email{moshez@math.huji.ac.il
} }
10 \maketitle\tableofcontents
12 \section{Introduction
}
14 {\large This is a draft
document; please
report inaccuracies and
15 omissions to the authors. This
document should not be treated as
16 definitive; features described here might be removed or changed during
17 the beta cycle before the final release of Python
2.0.
20 A new release of Python, version
2.0, will be released some time this
21 autumn. Beta versions are already available from
22 \url{http://www.pythonlabs.com/products/python2.0/
}. This article
23 covers the exciting new features in
2.0, highlights some other useful
24 changes, and points out a few incompatible changes that may require
27 Python's development never completely stops between releases, and a
28 steady flow of bug fixes and improvements are always being submitted.
29 A host of minor fixes, a few optimizations, additional docstrings, and
30 better error messages went into
2.0; to list them all would be
31 impossible, but they're certainly significant. Consult the
32 publicly-available CVS logs if you want to see the full list.
34 % ======================================================================
35 \section{What About Python
1.6?
}
37 Python
1.6 can be thought of as the Contractual Obligations Python
38 release. After the core development team left CNRI in May
2000, CNRI
39 requested that a
1.6 release be created, containing all the work on
40 Python that had been performed at CNRI. Python
1.6 therefore
41 represents the state of the CVS tree as of May
2000, with the most
42 significant new feature being Unicode support. Development continued
43 after May, of course, so the
1.6 tree received a few fixes to ensure
44 that it's forward-compatible with Python
2.0.
1.6 is therefore part
45 of Python's evolution, and not a side branch.
47 So, should you take much interest in Python
1.6? Probably not. The
48 1.6final and
2.0beta1 releases were made on the same day (September
5,
49 2000), the plan being to finalize Python
2.0 within a month or so. If
50 you have applications to maintain, there seems little point in
51 breaking things by moving to
1.6, fixing them, and then having another
52 round of breakage within a month by moving to
2.0; you're better off
53 just going straight to
2.0. Most of the really interesting features
54 described in this
document are only in
2.0, because a lot of work was
55 done between May and September.
57 % ======================================================================
58 \section{New Development Process
}
60 The most important change in Python
2.0 may not be to the code at all,
61 but to how Python is developed.
63 In May of
2000, the Python CVS tree was moved to SourceForge.
64 Previously, there were roughly
7 or so people who had write access to
65 the CVS tree, and all patches had to be inspected and checked in by
66 one of the people on this short list. Obviously, this wasn't very
67 scalable. By moving the CVS tree to SourceForge, it became possible
68 to grant write access to more people; as of September
2000 there were
69 27 people able to check in changes, a fourfold increase. This makes
70 possible large-scale changes that wouldn't be attempted if they'd have
71 to be filtered through the small group of core developers. For
72 example, one day Peter Schneider-Kamp took it into his head to drop
73 K\&R C compatibility and convert the C source for Python to ANSI
74 C. After getting approval on the python-dev mailing list, he launched
75 into a flurry of checkins that lasted about a week, other developers
76 joined in to help, and the job was done. If there were only
5 people
77 with write access, probably that task would have been viewed as
78 ``nice, but not worth the time and effort needed'' and it would
79 never have gotten done.
81 SourceForge also provides tools for tracking bug and patch
82 submissions, and in combination with the public CVS tree, they've
83 resulted in a remarkable increase in the speed of development.
84 Patches now get submitted, commented on, revised by people other than
85 the original submitter, and bounced back and forth between people
86 until the patch is deemed worth checking in. This didn't come without
87 a cost: developers now have more e-mail to deal with, more mailing
88 lists to follow, and special tools had to be written for the new
89 environment. For example, SourceForge sends default patch and bug
90 notification e-mail messages that are completely unhelpful, so Ka-Ping
91 Yee wrote an HTML screen-scraper that sends more useful messages.
93 The ease of adding code caused a few initial growing pains, such as
94 code was checked in before it was ready or without getting clear
95 agreement from the developer group. The approval process that has
96 emerged is somewhat similar to that used by the Apache group.
97 Developers can vote +
1, +
0, -
0, or -
1 on a patch; +
1 and -
1 denote
98 acceptance or rejection, while +
0 and -
0 mean the developer is mostly
99 indifferent to the change, though with a slight positive or negative
100 slant. The most significant change from the Apache model is that
101 Guido van Rossum, who has Benevolent Dictator For Life status, can
102 ignore the votes of the other developers and approve or reject a
103 change, effectively giving him a +Infinity / -Infinity vote.
105 Producing an actual patch is the last step in adding a new feature,
106 and is usually easy compared to the earlier task of coming up with a
107 good design. Discussions of new features can often explode into
108 lengthy mailing list threads, making the discussion hard to follow,
109 and no one can read every posting to python-dev. Therefore, a
110 relatively formal process has been set up to write Python Enhancement
111 Proposals (PEPs), modelled on the Internet RFC process. PEPs are
112 draft documents that describe a proposed new feature, and are
113 continually revised until the community reaches a consensus, either
114 accepting or rejecting the proposal. Quoting from the introduction to
115 PEP
1, ``PEP Purpose and Guidelines'':
118 PEP stands for Python Enhancement Proposal. A PEP is a design
119 document providing information to the Python community, or
120 describing a new feature for Python. The PEP should provide a
121 concise technical specification of the feature and a rationale for
124 We intend PEPs to be the primary mechanisms for proposing new
125 features, for collecting community input on an issue, and for
126 documenting the design decisions that have gone into Python. The
127 PEP author is responsible for building consensus within the
128 community and documenting dissenting opinions.
131 Read the rest of PEP
1 for the details of the PEP editorial process,
132 style, and format. PEPs are kept in the Python CVS tree on
133 SourceForge, though they're not part of the Python
2.0 distribution,
134 and are also available in HTML form from
135 \url{http://python.sourceforge.net/peps/
}. As of September
2000,
136 there are
25 PEPS, ranging from PEP
201, ``Lockstep Iteration'', to
137 PEP
225, ``Elementwise/Objectwise Operators''.
139 To
report bugs or submit patches for Python
2.0, use the bug tracking
140 and patch manager tools available from the SourceForge project page,
141 at
\url{http://sourceforge.net/projects/python/
}.
143 % ======================================================================
146 The largest new feature in Python
2.0 is a new fundamental data type:
147 Unicode strings. Unicode uses
16-bit numbers to represent characters
148 instead of the
8-bit number used by ASCII, meaning that
65,
536
149 distinct characters can be supported.
151 The final interface for Unicode support was arrived at through
152 countless often-stormy discussions on the python-dev mailing list, and
153 mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string
154 type implementation by Fredrik Lundh. A detailed explanation of the
155 interface is in the file
\file{Misc/unicode.txt
} in the Python source
156 distribution; it's also available on the Web at
157 \url{http://starship.python.net/crew/lemburg/unicode-proposal.txt
}.
158 This article will simply cover the most significant points from the
161 In Python source code, Unicode strings are written as
162 \code{u"string"
}. Arbitrary Unicode characters can be written using a
163 new escape sequence,
\code{\e u
\var{HHHH
}}, where
\var{HHHH
} is a
164 4-digit hexadecimal number from
0000 to FFFF. The existing
165 \code{\e x
\var{HHHH
}} escape sequence can also be used, and octal
166 escapes can be used for characters up to U+
01FF, which is represented
169 Unicode strings, just like regular strings, are an immutable sequence
170 type. They can be indexed and sliced, but not modified in place.
171 Unicode strings have an
\method{encode(
\optional{encoding
} )
} method
172 that returns an
8-bit string in the desired encoding. Encodings are
173 named by strings, such as
\code{'ascii'
},
\code{'utf-
8'
},
174 \code{'iso-
8859-
1'
}, or whatever. A codec API is defined for
175 implementing and registering new encodings that are then available
176 throughout a Python program. If an encoding isn't specified, the
177 default encoding is usually
7-bit ASCII, though it can be changed for
178 your Python installation by calling the
179 \function{sys.setdefaultencoding(
\var{encoding
})
} function in a
180 customised version of
\file{site.py
}.
182 Combining
8-bit and Unicode strings always coerces to Unicode, using
183 the default ASCII encoding; the result of
\code{'a' + u'bc'
} is
186 New built-in functions have been added, and existing built-ins
187 modified to support Unicode:
190 \item \code{unichr(
\var{ch
})
} returns a Unicode string
1 character
191 long, containing the character
\var{ch
}.
193 \item \code{ord(
\var{u
})
}, where
\var{u
} is a
1-character regular or Unicode string, returns the number of the character as an integer.
195 \item \code{unicode(
\var{string
} \optional{,
\var{encoding
}}
196 \optional{,
\var{errors
}} )
} creates a Unicode string from an
8-bit
197 string.
\code{encoding
} is a string naming the encoding to use.
198 The
\code{errors
} parameter specifies the treatment of characters that
199 are invalid for the current encoding; passing
\code{'strict'
} as the
200 value causes an exception to be raised on any encoding error, while
201 \code{'ignore'
} causes errors to be silently ignored and
202 \code{'replace'
} uses U+FFFD, the official replacement character, in
203 case of any problems.
205 \item The
\keyword{exec
} statement, and various built-ins such as
206 \code{eval()
},
\code{getattr()
}, and
\code{setattr()
} will also
207 accept Unicode strings as well as regular strings. (It's possible
208 that the process of fixing this missed some built-ins; if you find a
209 built-in function that accepts strings but doesn't accept Unicode
210 strings at all, please
report it as a bug.)
214 A new module,
\module{unicodedata
}, provides an interface to Unicode
215 character properties. For example,
\code{unicodedata.category(u'A')
}
216 returns the
2-character string 'Lu', the 'L' denoting it's a letter,
217 and 'u' meaning that it's uppercase.
218 \code{u.bidirectional(u'
\e x0660')
} returns 'AN', meaning that U+
0660 is
221 The
\module{codecs
} module contains functions to look up existing encodings
222 and register new ones. Unless you want to implement a
223 new encoding, you'll most often use the
224 \function{codecs.lookup(
\var{encoding
})
} function, which returns a
225 4-element tuple:
\code{(
\var{encode_func
},
226 \var{decode_func
},
\var{stream_reader
},
\var{stream_writer
})
}.
229 \item \var{encode_func
} is a function that takes a Unicode string, and
230 returns a
2-tuple
\code{(
\var{string
},
\var{length
})
}.
\var{string
}
231 is an
8-bit string containing a portion (perhaps all) of the Unicode
232 string converted into the given encoding, and
\var{length
} tells you
233 how much of the Unicode string was converted.
235 \item \var{decode_func
} is the opposite of
\var{encode_func
}, taking
236 an
8-bit string and returning a
2-tuple
\code{(
\var{ustring
},
237 \var{length
})
}, consisting of the resulting Unicode string
238 \var{ustring
} and the integer
\var{length
} telling how much of the
239 8-bit string was consumed.
241 \item \var{stream_reader
} is a class that supports decoding input from
242 a stream.
\var{stream_reader(
\var{file_obj
})
} returns an object that
243 supports the
\method{read()
},
\method{readline()
}, and
244 \method{readlines()
} methods. These methods will all translate from
245 the given encoding and return Unicode strings.
247 \item \var{stream_writer
}, similarly, is a class that supports
248 encoding output to a stream.
\var{stream_writer(
\var{file_obj
})
}
249 returns an object that supports the
\method{write()
} and
250 \method{writelines()
} methods. These methods expect Unicode strings,
251 translating them to the given encoding on output.
254 For example, the following code writes a Unicode string into a file,
255 encoding it as UTF-
8:
260 unistr = u'
\u0660\u2000ab ...'
262 (UTF8_encode, UTF8_decode,
263 UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-
8')
265 output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
266 output.write( unistr )
270 The following code would then read UTF-
8 input from the file:
273 input = UTF8_streamreader( open( '/tmp/output', 'rb') )
274 print repr(input.read())
278 Unicode-aware regular expressions are available through the
279 \module{re
} module, which has a new underlying implementation called
280 SRE written by Fredrik Lundh of Secret Labs AB.
282 A
\code{-U
} command line option was added which causes the Python
283 compiler to interpret all string literals as Unicode string literals.
284 This is intended to be used in testing and future-proofing your Python
285 code, since some future version of Python may drop support for
8-bit
286 strings and provide only Unicode strings.
288 % ======================================================================
289 \section{List Comprehensions
}
291 Lists are a workhorse data type in Python, and many programs
292 manipulate a list at some point. Two common operations on lists are
293 to loop over them, and either pick out the elements that meet a
294 certain criterion, or apply some function to each element. For
295 example, given a list of strings, you might want to pull out all the
296 strings containing a given substring, or strip off trailing whitespace
299 The existing
\function{map()
} and
\function{filter()
} functions can be
300 used for this purpose, but they require a function as one of their
301 arguments. This is fine if there's an existing built-in function that
302 can be passed directly, but if there isn't, you have to create a
303 little function to do the required work, and Python's scoping rules
304 make the result ugly if the little function needs additional
305 information. Take the first example in the previous paragraph,
306 finding all the strings in the list containing a given substring. You
307 could write the following to do it:
310 # Given the list L, make a list of all strings
311 # containing the substring S.
312 sublist = filter( lambda s, substring=S:
313 string.find(s, substring) != -
1,
317 Because of Python's scoping rules, a default argument is used so that
318 the anonymous function created by the
\keyword{lambda
} statement knows
319 what substring is being searched for. List comprehensions make this
323 sublist =
[ s for s in L if string.find(s, S) != -
1 ]
326 List comprehensions have the form:
329 [ expression for expr in sequence1
330 for expr2 in sequence2 ...
331 for exprN in sequenceN
335 The
\keyword{for
}...
\keyword{in
} clauses contain the sequences to be
336 iterated over. The sequences do not have to be the same length,
337 because they are
\emph{not
} iterated over in parallel, but
338 from left to right; this is explained more clearly in the following
339 paragraphs. The elements of the generated list will be the successive
340 values of
\var{expression
}. The final
\keyword{if
} clause is
341 optional; if present,
\var{expression
} is only evaluated and added to
342 the result if
\var{condition
} is true.
344 To make the semantics very clear, a list comprehension is equivalent
345 to the following Python code:
348 for expr1 in sequence1:
349 for expr2 in sequence2:
351 for exprN in sequenceN:
353 # Append the value of
354 # the expression to the
358 This means that when there are
\keyword{for
}...
\keyword{in
} clauses,
359 the resulting list will be equal to the product of the lengths of all
360 the sequences. If you have two lists of length
3, the output list is
366 >>>
[ (x,y) for x in seq1 for y in seq2
]
367 [('a',
1), ('a',
2), ('a',
3), ('b',
1), ('b',
2), ('b',
3), ('c',
1),
371 To avoid introducing an ambiguity into Python's grammar, if
372 \var{expression
} is creating a tuple, it must be surrounded with
373 parentheses. The first list comprehension below is a syntax error,
374 while the second one is correct:
378 [ x,y for x in seq1 for y in seq2
]
380 [ (x,y) for x in seq1 for y in seq2
]
383 The idea of list comprehensions originally comes from the functional
384 programming language Haskell (
\url{http://www.haskell.org
}). Greg
385 Ewing argued most effectively for adding them to Python and wrote the
386 initial list comprehension patch, which was then discussed for a
387 seemingly endless time on the python-dev mailing list and kept
388 up-to-date by Skip Montanaro.
390 % ======================================================================
391 \section{Augmented Assignment
}
393 Augmented assignment operators, another long-requested feature, have
394 been added to Python
2.0. Augmented assignment operators include
395 \code{+=
},
\code{-=
},
\code{*=
}, and so forth. For example, the
396 statement
\code{a +=
2} increments the value of the variable
397 \code{a
} by
2, equivalent to the slightly lengthier
\code{a = a +
2}.
399 The full list of supported assignment operators is
\code{+=
},
400 \code{-=
},
\code{*=
},
\code{/=
},
\code{\%=
},
\code{**=
},
\code{\&=
},
401 \code{|=
},
\verb|^=|,
\code{>>=
}, and
\code{<<=
}. Python classes can
402 override the augmented assignment operators by defining methods named
403 \method{__iadd__
},
\method{__isub__
}, etc. For example, the following
404 \class{Number
} class stores a number and supports using += to create a
405 new instance with an incremented value.
409 def __init__(self, value):
411 def __iadd__(self, increment):
412 return Number( self.value + increment)
419 The
\method{__iadd__
} special method is called with the value of the
420 increment, and should return a new instance with an appropriately
421 modified value; this return value is bound as the new value of the
422 variable on the left-hand side.
424 Augmented assignment operators were first introduced in the C
425 programming language, and most C-derived languages, such as
426 \program{awk
}, C++, Java, Perl, and PHP also support them. The augmented
427 assignment patch was implemented by Thomas Wouters.
429 % ======================================================================
430 \section{String Methods
}
432 Until now string-manipulation functionality was in the
\module{string
}
433 module, which was usually a front-end for the
\module{strop
}
434 module written in C. The addition of Unicode posed a difficulty for
435 the
\module{strop
} module, because the functions would all need to be
436 rewritten in order to accept either
8-bit or Unicode strings. For
437 functions such as
\function{string.replace()
}, which takes
3 string
438 arguments, that means eight possible permutations, and correspondingly
441 Instead, Python
2.0 pushes the problem onto the string type, making
442 string manipulation functionality available through methods on both
443 8-bit strings and Unicode strings.
446 >>> 'andrew'.capitalize()
448 >>> 'hostname'.replace('os', 'linux')
450 >>> 'moshe'.find('sh')
454 One thing that hasn't changed, a noteworthy April Fools' joke
455 notwithstanding, is that Python strings are immutable. Thus, the
456 string methods return new strings, and do not modify the string on
459 The old
\module{string
} module is still around for backwards
460 compatibility, but it mostly acts as a front-end to the new string
463 Two methods which have no parallel in pre-
2.0 versions, although they
464 did exist in JPython for quite some time, are
\method{startswith()
}
465 and
\method{endswith
}.
\code{s.startswith(t)
} is equivalent to
\code{s
[:len(t)
]
466 == t
}, while
\code{s.endswith(t)
} is equivalent to
\code{s
[-len(t):
] == t
}.
468 One other method which deserves special mention is
\method{join
}. The
469 \method{join
} method of a string receives one parameter, a sequence of
470 strings, and is equivalent to the
\function{string.join
} function from
471 the old
\module{string
} module, with the arguments reversed. In other
472 words,
\code{s.join(seq)
} is equivalent to the old
473 \code{string.join(seq, s)
}.
475 % ======================================================================
476 \section{Optional Collection of Cycles
}
478 The C implementation of Python uses reference counting to implement
479 garbage collection. Every Python object maintains a count of the
480 number of references pointing to itself, and adjusts the count as
481 references are created or destroyed. Once the reference count reaches
482 zero, the object is no longer accessible, since you need to have a
483 reference to an object to access it, and if the count is zero, no
484 references exist any longer.
486 Reference counting has some pleasant properties: it's easy to
487 understand and implement, and the resulting implementation is
488 portable, fairly fast, and reacts well with other libraries that
489 implement their own memory handling schemes. The major problem with
490 reference counting is that it sometimes doesn't realise that objects
491 are no longer accessible, resulting in a memory leak. This happens
492 when there are cycles of references.
494 Consider the simplest possible cycle,
495 a class instance which has a reference to itself:
498 instance = SomeClass()
499 instance.myself = instance
502 After the above two lines of code have been executed, the reference
503 count of
\code{instance
} is
2; one reference is from the variable
504 named
\samp{'instance'
}, and the other is from the
\samp{myself
}
505 attribute of the instance.
507 If the next line of code is
\code{del instance
}, what happens? The
508 reference count of
\code{instance
} is decreased by
1, so it has a
509 reference count of
1; the reference in the
\samp{myself
} attribute
510 still exists. Yet the instance is no longer accessible through Python
511 code, and it could be deleted. Several objects can participate in a
512 cycle if they have references to each other, causing all of the
513 objects to be leaked.
515 An experimental step has been made toward fixing this problem. When
516 compiling Python, the
\verb|--with-cycle-gc| option can be specified.
517 This causes a cycle detection algorithm to be periodically executed,
518 which looks for inaccessible cycles and deletes the objects involved.
519 A new
\module{gc
} module provides functions to perform a garbage
520 collection, obtain debugging statistics, and tuning the collector's parameters.
522 Why isn't cycle detection enabled by default? Running the cycle detection
523 algorithm takes some time, and some tuning will be required to
524 minimize the overhead cost. It's not yet obvious how much performance
525 is lost, because benchmarking this is tricky and depends crucially
526 on how often the program creates and destroys objects.
528 Several people tackled this problem and contributed to a solution. An
529 early implementation of the cycle detection approach was written by
530 Toby Kelsey. The current algorithm was suggested by Eric Tiedemann
531 during a visit to CNRI, and Guido van Rossum and Neil Schemenauer
532 wrote two different implementations, which were later integrated by
533 Neil. Lots of other people offered suggestions along the way; the
534 March
2000 archives of the python-dev mailing list contain most of the
535 relevant discussion, especially in the threads titled ``Reference
536 cycle collection for Python'' and ``Finalization again''.
538 % ======================================================================
539 \section{Other Core Changes
}
541 Various minor changes have been made to Python's syntax and built-in
542 functions. None of the changes are very far-reaching, but they're
545 \subsection{Minor Language Changes
}
547 A new syntax makes it more convenient to call a given function
548 with a tuple of arguments and/or a dictionary of keyword arguments.
549 In Python
1.5 and earlier, you'd use the
\function{apply()
}
550 built-in function:
\code{apply(f,
\var{args
},
\var{kw
})
} calls the
551 function
\function{f()
} with the argument tuple
\var{args
} and the
552 keyword arguments in the dictionary
\var{kw
}.
\function{apply()
}
553 is the same in
2.0, but thanks to a patch from
554 Greg Ewing,
\code{f
(*\var{args}, **\var{kw})} as a shorter
555 and clearer way to achieve the same effect. This syntax is
556 symmetrical with the syntax for defining functions:
560 # args is a tuple of positional args,
561 # kw is a dictionary of keyword args
565 The \keyword{print} statement can now have its output directed to a
566 file-like object by following the \keyword{print} with
567 \verb|>> file|, similar to the redirection operator in Unix shells.
568 Previously you'd either have to use the \method{write()} method of the
569 file-like object, which lacks the convenience and simplicity of
570 \keyword{print}, or you could assign a new value to
571 \code{sys.stdout} and then restore the old value. For sending output to standard error,
572 it's much easier to write this:
575 print >> sys.stderr, "Warning: action field not supplied"
578 Modules can now be renamed on importing them, using the syntax
579 \code{import \var{module} as \var{name}} or \code{from \var{module}
580 import \var{name} as \var{othername}}. The patch was submitted by
583 A new format style is available when using the \code{\%} operator;
584 '\%r' will insert the \function{repr()} of its argument. This was
585 also added from symmetry considerations, this time for symmetry with
586 the existing '\%s' format style, which inserts the \function{str()} of
587 its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a
588 string containing \verb|'abc' abc|.
590 Previously there was no way to implement a class that overrode
591 Python's built-in \keyword{in} operator and implemented a custom
592 version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is
593 present in the sequence \var{seq}; Python computes this by simply
594 trying every index of the sequence until either \var{obj} is found or
595 an \exception{IndexError} is encountered. Moshe Zadka contributed a
596 patch which adds a \method{__contains__} magic method for providing a
597 custom implementation for \keyword{in}. Additionally, new built-in
598 objects written in C can define what \keyword{in} means for them via a
599 new slot in the sequence protocol.
601 Earlier versions of Python used a recursive algorithm for deleting
602 objects. Deeply nested data structures could cause the interpreter to
603 fill up the C stack and crash; Christian Tismer rewrote the deletion
604 logic to fix this problem. On a related note, comparing recursive
605 objects recursed infinitely and crashed; Jeremy Hylton rewrote the
606 code to no longer crash, producing a useful result instead. For
607 example, after this code:
616 The comparison \code{a==b} returns true, because the two recursive
617 data structures are isomorphic. \footnote{See the thread ``trashcan
618 and PR\#7'' in the April 2000 archives of the python-dev mailing list
619 for the discussion leading up to this implementation, and some useful
621 %http://www.python.org/pipermail/python-dev/2000-April/004834.html
624 Work has been done on porting Python to 64-bit Windows on the Itanium
625 processor, mostly by Trent Mick of ActiveState. (Confusingly,
626 \code{sys.platform} is still \code{'win32'} on Win64 because it seems
627 that for ease of porting, MS Visual C++ treats code as 32 bit on Itanium.)
628 PythonWin also supports Windows CE; see the Python CE page at
629 \url{http://starship.python.net/crew/mhammond/ce/} for more
632 An attempt has been made to alleviate one of Python's warts, the
633 often-confusing \exception{NameError} exception when code refers to a
634 local variable before the variable has been assigned a value. For
635 example, the following code raises an exception on the \keyword{print}
636 statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError}
637 exception is raised, while 2.0 raises a new
638 \exception{UnboundLocalError} exception.
639 \exception{UnboundLocalError} is a subclass of \exception{NameError},
640 so any existing code that expects \exception{NameError} to be raised
650 Two new exceptions, \exception{TabError} and
651 \exception{IndentationError}, have been introduced. They're both
652 subclasses of \exception{SyntaxError}, and are raised when Python code
653 is found to be improperly indented.
655 \subsection{Changes to Built-in Functions}
657 A new built-in, \function{zip(\var{seq1}, \var{seq2}, ...)}, has been
658 added. \function{zip()} returns a list of tuples where each tuple
659 contains the i-th element from each of the argument sequences. The
660 difference between \function{zip()} and \code{map(None, \var{seq1},
661 \var{seq2})} is that \function{map()} pads the sequences with
662 \code{None} if the sequences aren't all of the same length, while
663 \function{zip()} truncates the returned list to the length of the
664 shortest argument sequence.
666 The \function{int()} and \function{long()} functions now accept an
667 optional ``base'' parameter when the first argument is a string.
668 \code{int('123', 10)} returns 123, while \code{int('123', 16)} returns
669 291. \code{int(123, 16)} raises a \exception{TypeError} exception
670 with the message ``can't convert non-string with explicit base''.
672 A new variable holding more detailed version information has been
673 added to the \module{sys} module. \code{sys.version_info} is a tuple
674 \code{(\var{major}, \var{minor}, \var{micro}, \var{level},
675 \var{serial})} For example, in a hypothetical 2.0.1beta1,
676 \code{sys.version_info} would be \code{(2, 0, 1, 'beta', 1)}.
677 \var{level} is a string such as \code{"alpha"}, \code{"beta"}, or
678 \code{"final"} for a final release.
680 Dictionaries have an odd new method, \method{setdefault(\var{key},
681 \var{default})}, which behaves similarly to the existing
682 \method{get()} method. However, if the key is missing,
683 \method{setdefault()} both returns the value of \var{default} as
684 \method{get()} would do, and also inserts it into the dictionary as
685 the value for \var{key}. Thus, the following lines of code:
688 if dict.has_key( key ): return dict[key]
694 can be reduced to a single \code{return dict.setdefault(key, [])} statement.
696 The interpreter sets a maximum recursion depth in order to catch
697 runaway recursion before filling the C stack and causing a core dump
698 or GPF.. Previously this limit was fixed when you compiled Python,
699 but in 2.0 the maximum recursion depth can be read and modified using
700 \function{sys.getrecursionlimit} and \function{sys.setrecursionlimit}.
701 The default value is 1000, and a rough maximum value for a given
702 platform can be found by running a new script,
703 \file{Misc/find_recursionlimit.py}.
705 % ======================================================================
706 \section{Porting to 2.0}
708 New Python releases try hard to be compatible with previous releases,
709 and the record has been pretty good. However, some changes are
710 considered useful enough, usually because they fix initial design decisions that
711 turned out to be actively mistaken, that breaking backward compatibility
712 can't always be avoided. This section lists the changes in Python 2.0
713 that may cause old Python code to break.
715 The change which will probably break the most code is tightening up
716 the arguments accepted by some methods. Some methods would take
717 multiple arguments and treat them as a tuple, particularly various
718 list methods such as \method{.append()} and \method{.insert()}.
719 In earlier versions of Python, if \code{L} is a list, \code{L.append(
720 1,2 )} appends the tuple \code{(1,2)} to the list. In Python 2.0 this
721 causes a \exception{TypeError} exception to be raised, with the
722 message: 'append requires exactly 1 argument; 2 given'. The fix is to
723 simply add an extra set of parentheses to pass both values as a tuple:
724 \code{L.append( (1,2) )}.
726 The earlier versions of these methods were more forgiving because they
727 used an old function in Python's C interface to parse their arguments;
728 2.0 modernizes them to use \function{PyArg_ParseTuple}, the current
729 argument parsing function, which provides more helpful error messages
730 and treats multi-argument calls as errors. If you absolutely must use
731 2.0 but can't fix your code, you can edit \file{Objects/listobject.c}
732 and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to
733 preserve the old behaviour; this isn't recommended.
735 Some of the functions in the \module{socket} module are still
736 forgiving in this way. For example, \function{socket.connect(
737 ('hostname', 25) )} is the correct form, passing a tuple representing
738 an IP address, but \function{socket.connect( 'hostname', 25 )} also
739 works. \function{socket.connect_ex()} and \function{socket.bind()} are
740 similarly easy-going. 2.0alpha1 tightened these functions up, but
741 because the documentation actually used the erroneous multiple
742 argument form, many people wrote code which would break with the
743 stricter checking. GvR backed out the changes in the face of public
744 reaction, so for the \module{socket} module, the documentation was
745 fixed and the multiple argument form is simply marked as deprecated;
746 it \emph{will} be tightened up again in a future Python version.
748 The \code{\e x} escape in string literals now takes exactly 2 hex
749 digits. Previously it would consume all the hex digits following the
750 'x' and take the lowest 8 bits of the result, so \code{\e x123456} was
751 equivalent to \code{\e x56}.
753 The \exception{AttributeError} exception has a more friendly error message,
754 whose text will be something like \code{'Spam' instance has no attribute 'eggs'}.
755 Previously the error message was just the missing attribute name \code{eggs}, and
756 code written to take advantage of this fact will break in 2.0.
758 Some work has been done to make integers and long integers a bit more
759 interchangeable. In 1.5.2, large-file support was added for Solaris,
760 to allow reading files larger than 2Gb; this made the \method{tell()}
761 method of file objects return a long integer instead of a regular
762 integer. Some code would subtract two file offsets and attempt to use
763 the result to multiply a sequence or slice a string, but this raised a
764 \exception{TypeError}. In 2.0, long integers can be used to multiply
765 or slice a sequence, and it'll behave as you'd intuitively expect it
766 to; \code{3L * 'abc'} produces 'abcabcabc', and \code{
767 (0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in
768 various contexts where previously only integers were accepted, such
769 as in the \method{seek()} method of file objects, and in the formats
770 supported by the \verb|%| operator (\verb|%d|, \verb|%i|, \verb|%x|,
771 etc.). For example, \code{"\%d" \% 2L**64} will produce the string
772 \samp{18446744073709551616}.
774 The subtlest long integer change of all is that the \function{str()}
775 of a long integer no longer has a trailing 'L' character, though
776 \function{repr()} still includes it. The 'L' annoyed many people who
777 wanted to print long integers that looked just like regular integers,
778 since they had to go out of their way to chop off the character. This
779 is no longer a problem in 2.0, but code which does \code{str(longval)[:-1]} and assumes the 'L' is there, will now lose
782 Taking the \function{repr()} of a float now uses a different
783 formatting precision than \function{str()}. \function{repr()} uses
784 \code{\%.17g} format string for C's \function{sprintf()}, while
785 \function{str()} uses \code{\%.12g} as before. The effect is that
786 \function{repr()} may occasionally show more decimal places than
787 \function{str()}, for certain numbers.
788 For example, the number 8.1 can't be represented exactly in binary, so
789 \code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is
792 The \code{-X} command-line option, which turned all standard
793 exceptions into strings instead of classes, has been removed; the
794 standard exceptions will now always be classes. The
795 \module{exceptions} module containing the standard exceptions was
796 translated from Python to a built-in C module, written by Barry Warsaw
799 % Commented out for now -- I don't think anyone will care.
800 %The pattern and match objects provided by SRE are C types, not Python
801 %class instances as in 1.5. This means you can no longer inherit from
802 %\class{RegexObject} or \class{MatchObject}, but that shouldn't be much
803 %of a problem since no one should have been doing that in the first
806 % ======================================================================
807 \section{Extending/Embedding Changes}
809 Some of the changes are under the covers, and will only be apparent to
810 people writing C extension modules or embedding a Python interpreter
811 in a larger application. If you aren't dealing with Python's C API,
812 you can safely skip this section.
814 The version number of the Python C API was incremented, so C
815 extensions compiled for 1.5.2 must be recompiled in order to work with
816 2.0. On Windows, attempting to import a third party extension built
817 for Python 1.5.x usually results in an immediate crash; there's not
818 much we can do about this. (Here's Mark Hammond's explanation of the
819 reasons for the crash. The 1.5 module is linked against
820 \file{Python15.dll}. When \file{Python.exe} , linked against
821 \file{Python16.dll}, starts up, it initializes the Python data
822 structures in \file{Python16.dll}. When Python then imports the
823 module \file{foo.pyd} linked against \file{Python15.dll}, it
824 immediately tries to call the functions in that DLL. As Python has
825 not been initialized in that DLL, the program immediately crashes.)
827 Users of Jim Fulton's ExtensionClass module will be pleased to find
828 out that hooks have been added so that ExtensionClasses are now
829 supported by \function{isinstance()} and \function{issubclass()}.
830 This means you no longer have to remember to write code such as
831 \code{if type(obj) == myExtensionClass}, but can use the more natural
832 \code{if isinstance(obj, myExtensionClass)}.
834 The \file{Python/importdl.c} file, which was a mass of \#ifdefs to
835 support dynamic loading on many different platforms, was cleaned up
836 and reorganised by Greg Stein. \file{importdl.c} is now quite small,
837 and platform-specific code has been moved into a bunch of
838 \file{Python/dynload_*.c} files. Another cleanup: there were also a
839 number of \file{my*.h} files in the Include/ directory that held
840 various portability hacks; they've been merged into a single file,
841 \file{Include/pyport.h}.
843 Vladimir Marangozov's long-awaited malloc restructuring was completed,
844 to make it easy to have the Python interpreter use a custom allocator
845 instead of C's standard \function{malloc()}. For documentation, read
846 the comments in \file{Include/pymem.h} and
847 \file{Include/objimpl.h}. For the lengthy discussions during which
848 the interface was hammered out, see the Web archives of the 'patches'
849 and 'python-dev' lists at python.org.
851 Recent versions of the GUSI development environment for MacOS support
852 POSIX threads. Therefore, Python's POSIX threading support now works
853 on the Macintosh. Threading support using the user-space GNU \texttt{pth}
854 library was also contributed.
856 Threading support on Windows was enhanced, too. Windows supports
857 thread locks that use kernel objects only in case of contention; in
858 the common case when there's no contention, they use simpler functions
859 which are an order of magnitude faster. A threaded version of Python
860 1.5.2 on NT is twice as slow as an unthreaded version; with the 2.0
861 changes, the difference is only 10\%. These improvements were
862 contributed by Yakov Markovitch.
864 Python 2.0's source now uses only ANSI C prototypes, so compiling Python now
865 requires an ANSI C compiler, and can no longer be done using a compiler that
866 only supports K\&R C.
868 Previously the Python virtual machine used 16-bit numbers in its
869 bytecode, limiting the size of source files. In particular, this
870 affected the maximum size of literal lists and dictionaries in Python
871 source; occasionally people who are generating Python code would run
872 into this limit. A patch by Charles G. Waldman raises the limit from
873 \verb|2^16| to \verb|2^{32}|.
875 Three new convenience functions intended for adding constants to a
876 module's dictionary at module initialization time were added:
877 \function{PyModule_AddObject()}, \function{PyModule_AddIntConstant()},
878 and \function{PyModule_AddStringConstant()}. Each of these functions
879 takes a module object, a null-terminated C string containing the name
880 to be added, and a third argument for the value to be assigned to the
881 name. This third argument is, respectively, a Python object, a C
884 A wrapper API was added for Unix-style signal handlers.
885 \function{PyOS_getsig()} gets a signal handler and
886 \function{PyOS_setsig()} will set a new handler.
888 % ======================================================================
889 \section{Distutils: Making Modules Easy to Install}
891 Before Python 2.0, installing modules was a tedious affair -- there
892 was no way to figure out automatically where Python is installed, or
893 what compiler options to use for extension modules. Software authors
894 had to go through an arduous ritual of editing Makefiles and
895 configuration files, which only really work on Unix and leave Windows
896 and MacOS unsupported. Python users faced wildly differing
897 installation instructions which varied between different extension
898 packages, which made adminstering a Python installation something of a
901 The SIG for distribution utilities, shepherded by Greg Ward, has
902 created the Distutils, a system to make package installation much
903 easier. They form the \module{distutils} package, a new part of
904 Python's standard library. In the best case, installing a Python
905 module from source will require the same steps: first you simply mean
906 unpack the tarball or zip archive, and the run ``\code{python setup.py
907 install}''. The platform will be automatically detected, the compiler
908 will be recognized, C extension modules will be compiled, and the
909 distribution installed into the proper directory. Optional
910 command-line arguments provide more control over the installation
911 process, the distutils package offers many places to override defaults
912 -- separating the build from the install, building or installing in
913 non-default directories, and more.
915 In order to use the Distutils, you need to write a \file{setup.py}
916 script. For the simple case, when the software contains only .py
917 files, a minimal \file{setup.py} can be just a few lines long:
920 from distutils.core import setup
921 setup (name = "foo", version = "1.0",
922 py_modules = ["module1", "module2"])
925 The \file{setup.py} file isn't much more complicated if the software
926 consists of a few packages:
929 from distutils.core import setup
930 setup (name = "foo", version = "1.0",
931 packages = ["package", "package.subpackage"])
934 A C extension can be the most complicated case; here's an example taken from
939 from distutils.core import setup, Extension
941 expat_extension = Extension('xml.parsers.pyexpat',
942 define_macros = [('XML_NS', None)],
943 include_dirs = [ 'extensions/expat/xmltok',
944 'extensions/expat/xmlparse' ],
945 sources = [ 'extensions/pyexpat.c',
946 'extensions/expat/xmltok/xmltok.c',
947 'extensions/expat/xmltok/xmlrole.c',
950 setup (name = "PyXML", version = "0.5.4",
951 ext_modules =[ expat_extension ] )
955 The Distutils can also take care of creating source and binary
956 distributions. The ``sdist'' command, run by ``\code{python setup.py
957 sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}.
958 Adding new commands isn't difficult, ``bdist_rpm'' and
959 ``bdist_wininst'' commands have already been contributed to create an
960 RPM distribution and a Windows installer for the software,
961 respectively. Commands to create other distribution formats such as
962 Debian packages and Solaris \file{.pkg} files are in various stages of
965 All this is documented in a new manual, \textit{Distributing Python
966 Modules}, that joins the basic set of Python documentation.
968 % ======================================================================
969 %\section{New XML Code}
971 %XXX write this section...
973 % ======================================================================
974 \section{Module changes}
976 Lots of improvements and bugfixes were made to Python's extensive
977 standard library; some of the affected modules include
978 \module{readline}, \module{ConfigParser}, \module{cgi},
979 \module{calendar}, \module{posix}, \module{readline}, \module{xmllib},
980 \module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve},
981 and \module{nntplib}. Consult the CVS logs for the exact
982 patch-by-patch details.
984 Brian Gallew contributed OpenSSL support for the \module{socket}
985 module. OpenSSL is an implementation of the Secure Socket Layer,
986 which encrypts the data being sent over a socket. When compiling
987 Python, you can edit \file{Modules/Setup} to include SSL support,
988 which adds an additional function to the \module{socket} module:
989 \function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})},
990 which takes a socket object and returns an SSL socket. The
991 \module{httplib} and \module{urllib} modules were also changed to
992 support ``https://'' URLs, though no one has implemented FTP or SMTP
995 The \module{httplib} module has been rewritten by Greg Stein to
996 support HTTP/1.1. Backward compatibility with the 1.5 version of
997 \module{httplib} is provided, though using HTTP/1.1 features such as
998 pipelining will require rewriting code to use a different set of
1001 The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or
1002 8.3, and support for the older 7.x versions has been dropped. The
1003 Tkinter module now supports displaying Unicode strings in Tk widgets.
1004 Also, Fredrik Lundh contributed an optimization which makes operations
1005 like \code{create_line} and \code{create_polygon} much faster,
1006 especially when using lots of coordinates.
1008 The \module{curses} module has been greatly extended, starting from
1009 Oliver Andrich's enhanced version, to provide many additional
1010 functions from ncurses and SYSV curses, such as colour, alternative
1011 character set support, pads, and mouse support. This means the module
1012 is no longer compatible with operating systems that only have BSD
1013 curses, but there don't seem to be any currently maintained OSes that
1014 fall into this category.
1016 As mentioned in the earlier discussion of 2.0's Unicode support, the
1017 underlying implementation of the regular expressions provided by the
1018 \module{re} module has been changed. SRE, a new regular expression
1019 engine written by Fredrik Lundh and partially funded by Hewlett
1020 Packard, supports matching against both 8-bit strings and Unicode
1023 % ======================================================================
1024 \section{New modules}
1026 A number of new modules were added. We'll simply list them with brief
1027 descriptions; consult the 2.0 documentation for the details of a
1032 \item{\module{atexit}}:
1033 For registering functions to be called before the Python interpreter exits.
1034 Code that currently sets
1035 \code{sys.exitfunc} directly should be changed to
1036 use the \module{atexit} module instead, importing \module{atexit}
1037 and calling \function{atexit.register()} with
1038 the function to be called on exit.
1039 (Contributed by Skip Montanaro.)
1041 \item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support.
1043 \item{\module{filecmp}:} Supersedes the old \module{cmp}, \module{cmpcache} and
1044 \module{dircmp} modules, which have now become deprecated.
1045 (Contributed by Gordon MacMillan and Moshe Zadka.)
1047 \item{\module{linuxaudiodev}:} Support for the \file{/dev/audio}
1048 device on Linux, a twin to the existing \module{sunaudiodev} module.
1049 (Contributed by Peter Bosch.)
1051 \item{\module{mmap}:} An interface to memory-mapped files on both
1052 Windows and Unix. A file's contents can be mapped directly into
1053 memory, at which point it behaves like a mutable string, so its
1054 contents can be read and modified. They can even be passed to
1055 functions that expect ordinary strings, such as the \module{re}
1056 module. (Contributed by Sam Rushing, with some extensions by
1059 \item{\module{pyexpat}:} An interface to the Expat XML parser.
1060 (Contributed by Paul Prescod.)
1062 \item{\module{robotparser}:} Parse a \file{robots.txt} file, which is
1063 used for writing Web spiders that politely avoid certain areas of a
1064 Web site. The parser accepts the contents of a \file{robots.txt} file,
1065 builds a set of rules from it, and can then answer questions about
1066 the fetchability of a given URL. (Contributed by Skip Montanaro.)
1068 \item{\module{tabnanny}:} A module/script to
1069 check Python source code for ambiguous indentation.
1070 (Contributed by Tim Peters.)
1072 \item{\module{UserString}:} A base class useful for deriving objects that behave like strings.
1074 \item{\module{webbrowser}:} A module that provides a platform independent
1075 way to launch a web browser on a specific URL. For each platform, various
1076 browsers are tried in a specific order. The user can alter which browser
1077 is launched by setting the \var{BROWSER} environment variable.
1078 (Originally inspired by Eric S. Raymond's patch to \module{urllib}
1079 which added similar functionality, but
1080 the final module comes from code originally
1081 implemented by Fred Drake as \file{Tools/idle/BrowserControl.py},
1082 and adapted for the standard library by Fred.)
1084 \item{\module{_winreg}:} An interface to the
1085 Windows registry. \module{_winreg} is an adaptation of functions that
1086 have been part of PythonWin since 1995, but has now been added to the core
1087 distribution, and enhanced to support Unicode.
1088 \module{_winreg} was written by Bill Tutt and Mark Hammond.
1090 \item{\module{zipfile}:} A module for reading and writing ZIP-format
1091 archives. These are archives produced by \program{PKZIP} on
1092 DOS/Windows or \program{zip} on Unix, not to be confused with
1093 \program{gzip}-format files (which are supported by the \module{gzip}
1095 (Contributed by James C. Ahlstrom.)
1097 \item{\module{imputil}:} A module that provides a simpler way for
1098 writing customised import hooks, in comparison to the existing
1099 \module{ihooks} module. (Implemented by Greg Stein, with much
1100 discussion on python-dev along the way.)
1104 % ======================================================================
1105 \section{IDLE Improvements}
1107 IDLE is the official Python cross-platform IDE, written using Tkinter.
1108 Python 2.0 includes IDLE 0.6, which adds a number of new features and
1109 improvements. A partial list:
1112 \item UI improvements and optimizations,
1113 especially in the area of syntax highlighting and auto-indentation.
1115 \item The class browser now shows more information, such as the top
1116 level functions in a module.
1118 \item Tab width is now a user settable option. When opening an existing Python
1119 file, IDLE automatically detects the indentation conventions, and adapts.
1121 \item There is now support for calling browsers on various platforms,
1122 used to open the Python documentation in a browser.
1124 \item IDLE now has a command line, which is largely similar to
1125 the vanilla Python interpreter.
1127 \item Call tips were added in many places.
1129 \item IDLE can now be installed as a package.
1131 \item In the editor window, there is now a line/column bar at the bottom.
1133 \item Three new keystroke commands: Check module (Alt-F5), Import
1134 module (F5) and Run script (Ctrl-F5).
1138 % ======================================================================
1139 \section{Deleted and Deprecated Modules}
1141 A few modules have been dropped because they're obsolete, or because
1142 there are now better ways to do the same thing. The \module{stdwin}
1143 module is gone; it was for a platform-independent windowing toolkit
1144 that's no longer developed.
1146 A number of modules have been moved to the
1147 \file{lib-old} subdirectory:
1148 \module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump},
1149 \module{find}, \module{grep}, \module{packmail},
1150 \module{poly}, \module{util}, \module{whatsound}, \module{zmod}.
1151 If you have code which relies on a module that's been moved to
1152 \file{lib-old}, you can simply add that directory to \code{sys.path}
1153 to get them back, but you're encouraged to update any code that uses
1156 \section{Acknowledgements}
1158 The authors would like to thank the following people for offering
1159 suggestions on drafts of this article: Mark Hammond, Gregg Hauser,
1160 Fredrik Lundh, Detlef Lannert, Skip Montanaro, Vladimir Marangozov,
1161 Guido van Rossum, and Neil Schemenauer.