This commit was manufactured by cvs2svn to create tag 'r241c1'.
[python/dscho.git] / Doc / lib / libpickle.tex
blobf5c23f068d64c564e1b2210a2064ede1266771a2
1 \section{\module{pickle} --- Python object serialization}
3 \declaremodule{standard}{pickle}
4 \modulesynopsis{Convert Python objects to streams of bytes and back.}
5 % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
6 % Rewritten by Barry Warsaw <barry@zope.com>
8 \index{persistence}
9 \indexii{persistent}{objects}
10 \indexii{serializing}{objects}
11 \indexii{marshalling}{objects}
12 \indexii{flattening}{objects}
13 \indexii{pickling}{objects}
15 The \module{pickle} module implements a fundamental, but powerful
16 algorithm for serializing and de-serializing a Python object
17 structure. ``Pickling'' is the process whereby a Python object
18 hierarchy is converted into a byte stream, and ``unpickling'' is the
19 inverse operation, whereby a byte stream is converted back into an
20 object hierarchy. Pickling (and unpickling) is alternatively known as
21 ``serialization'', ``marshalling,''\footnote{Don't confuse this with
22 the \refmodule{marshal} module} or ``flattening'',
23 however, to avoid confusion, the terms used here are ``pickling'' and
24 ``unpickling''.
26 This documentation describes both the \module{pickle} module and the
27 \refmodule{cPickle} module.
29 \subsection{Relationship to other Python modules}
31 The \module{pickle} module has an optimized cousin called the
32 \module{cPickle} module. As its name implies, \module{cPickle} is
33 written in C, so it can be up to 1000 times faster than
34 \module{pickle}. However it does not support subclassing of the
35 \function{Pickler()} and \function{Unpickler()} classes, because in
36 \module{cPickle} these are functions, not classes. Most applications
37 have no need for this functionality, and can benefit from the improved
38 performance of \module{cPickle}. Other than that, the interfaces of
39 the two modules are nearly identical; the common interface is
40 described in this manual and differences are pointed out where
41 necessary. In the following discussions, we use the term ``pickle''
42 to collectively describe the \module{pickle} and
43 \module{cPickle} modules.
45 The data streams the two modules produce are guaranteed to be
46 interchangeable.
48 Python has a more primitive serialization module called
49 \refmodule{marshal}, but in general
50 \module{pickle} should always be the preferred way to serialize Python
51 objects. \module{marshal} exists primarily to support Python's
52 \file{.pyc} files.
54 The \module{pickle} module differs from \refmodule{marshal} several
55 significant ways:
57 \begin{itemize}
59 \item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
63 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
74 \item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
80 \item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
88 \end{itemize}
90 \begin{notice}[warning]
91 The \module{pickle} module is not intended to be secure against
92 erroneous or maliciously constructed data. Never unpickle data
93 received from an untrusted or unauthenticated source.
94 \end{notice}
96 Note that serialization is a more primitive notion than persistence;
97 although
98 \module{pickle} reads and writes file objects, it does not handle the
99 issue of naming persistent objects, nor the (even more complicated)
100 issue of concurrent access to persistent objects. The \module{pickle}
101 module can transform a complex object into a byte stream and it can
102 transform the byte stream into an object with the same internal
103 structure. Perhaps the most obvious thing to do with these byte
104 streams is to write them onto a file, but it is also conceivable to
105 send them across a network or store them in a database. The module
106 \refmodule{shelve} provides a simple interface
107 to pickle and unpickle objects on DBM-style database files.
109 \subsection{Data stream format}
111 The data format used by \module{pickle} is Python-specific. This has
112 the advantage that there are no restrictions imposed by external
113 standards such as XDR\index{XDR}\index{External Data Representation}
114 (which can't represent pointer sharing); however it means that
115 non-Python programs may not be able to reconstruct pickled Python
116 objects.
118 By default, the \module{pickle} data format uses a printable \ASCII{}
119 representation. This is slightly more voluminous than a binary
120 representation. The big advantage of using printable \ASCII{} (and of
121 some other characteristics of \module{pickle}'s representation) is that
122 for debugging or recovery purposes it is possible for a human to read
123 the pickled file with a standard text editor.
125 There are currently 3 different protocols which can be used for pickling.
127 \begin{itemize}
129 \item Protocol version 0 is the original ASCII protocol and is backwards
130 compatible with earlier versions of Python.
132 \item Protocol version 1 is the old binary format which is also compatible
133 with earlier versions of Python.
135 \item Protocol version 2 was introduced in Python 2.3. It provides
136 much more efficient pickling of new-style classes.
138 \end{itemize}
140 Refer to PEP 307 for more information.
142 If a \var{protocol} is not specified, protocol 0 is used.
143 If \var{protocol} is specified as a negative value
144 or \constant{HIGHEST_PROTOCOL},
145 the highest protocol version available will be used.
147 \versionchanged[The \var{bin} parameter is deprecated and only provided
148 for backwards compatibility. You should use the \var{protocol}
149 parameter instead]{2.3}
151 A binary format, which is slightly more efficient, can be chosen by
152 specifying a true value for the \var{bin} argument to the
153 \class{Pickler} constructor or the \function{dump()} and \function{dumps()}
154 functions. A \var{protocol} version >= 1 implies use of a binary format.
156 \subsection{Usage}
158 To serialize an object hierarchy, you first create a pickler, then you
159 call the pickler's \method{dump()} method. To de-serialize a data
160 stream, you first create an unpickler, then you call the unpickler's
161 \method{load()} method. The \module{pickle} module provides the
162 following constant:
164 \begin{datadesc}{HIGHEST_PROTOCOL}
165 The highest protocol version available. This value can be passed
166 as a \var{protocol} value.
167 \versionadded{2.3}
168 \end{datadesc}
170 The \module{pickle} module provides the
171 following functions to make this process more convenient:
173 \begin{funcdesc}{dump}{obj, file\optional{, protocol\optional{, bin}}}
174 Write a pickled representation of \var{obj} to the open file object
175 \var{file}. This is equivalent to
176 \code{Pickler(\var{file}, \var{protocol}, \var{bin}).dump(\var{obj})}.
178 If the \var{protocol} parameter is omitted, protocol 0 is used.
179 If \var{protocol} is specified as a negative value
180 or \constant{HIGHEST_PROTOCOL},
181 the highest protocol version will be used.
183 \versionchanged[The \var{protocol} parameter was added.
184 The \var{bin} parameter is deprecated and only provided
185 for backwards compatibility. You should use the \var{protocol}
186 parameter instead]{2.3}
188 If the optional \var{bin} argument is true, the binary pickle format
189 is used; otherwise the (less efficient) text pickle format is used
190 (for backwards compatibility, this is the default).
192 \var{file} must have a \method{write()} method that accepts a single
193 string argument. It can thus be a file object opened for writing, a
194 \refmodule{StringIO} object, or any other custom
195 object that meets this interface.
196 \end{funcdesc}
198 \begin{funcdesc}{load}{file}
199 Read a string from the open file object \var{file} and interpret it as
200 a pickle data stream, reconstructing and returning the original object
201 hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
203 \var{file} must have two methods, a \method{read()} method that takes
204 an integer argument, and a \method{readline()} method that requires no
205 arguments. Both methods should return a string. Thus \var{file} can
206 be a file object opened for reading, a
207 \module{StringIO} object, or any other custom
208 object that meets this interface.
210 This function automatically determines whether the data stream was
211 written in binary mode or not.
212 \end{funcdesc}
214 \begin{funcdesc}{dumps}{obj\optional{, protocol\optional{, bin}}}
215 Return the pickled representation of the object as a string, instead
216 of writing it to a file.
218 If the \var{protocol} parameter is omitted, protocol 0 is used.
219 If \var{protocol} is specified as a negative value
220 or \constant{HIGHEST_PROTOCOL},
221 the highest protocol version will be used.
223 \versionchanged[The \var{protocol} parameter was added.
224 The \var{bin} parameter is deprecated and only provided
225 for backwards compatibility. You should use the \var{protocol}
226 parameter instead]{2.3}
228 If the optional \var{bin} argument is
229 true, the binary pickle format is used; otherwise the (less efficient)
230 text pickle format is used (this is the default).
231 \end{funcdesc}
233 \begin{funcdesc}{loads}{string}
234 Read a pickled object hierarchy from a string. Characters in the
235 string past the pickled object's representation are ignored.
236 \end{funcdesc}
238 The \module{pickle} module also defines three exceptions:
240 \begin{excdesc}{PickleError}
241 A common base class for the other exceptions defined below. This
242 inherits from \exception{Exception}.
243 \end{excdesc}
245 \begin{excdesc}{PicklingError}
246 This exception is raised when an unpicklable object is passed to
247 the \method{dump()} method.
248 \end{excdesc}
250 \begin{excdesc}{UnpicklingError}
251 This exception is raised when there is a problem unpickling an object.
252 Note that other exceptions may also be raised during unpickling,
253 including (but not necessarily limited to) \exception{AttributeError},
254 \exception{EOFError}, \exception{ImportError}, and \exception{IndexError}.
255 \end{excdesc}
257 The \module{pickle} module also exports two callables\footnote{In the
258 \module{pickle} module these callables are classes, which you could
259 subclass to customize the behavior. However, in the \refmodule{cPickle}
260 module these callables are factory functions and so cannot be
261 subclassed. One common reason to subclass is to control what
262 objects can actually be unpickled. See section~\ref{pickle-sub} for
263 more details.}, \class{Pickler} and \class{Unpickler}:
265 \begin{classdesc}{Pickler}{file\optional{, protocol\optional{, bin}}}
266 This takes a file-like object to which it will write a pickle data
267 stream.
269 If the \var{protocol} parameter is omitted, protocol 0 is used.
270 If \var{protocol} is specified as a negative value,
271 the highest protocol version will be used.
273 \versionchanged[The \var{bin} parameter is deprecated and only provided
274 for backwards compatibility. You should use the \var{protocol}
275 parameter instead]{2.3}
277 Optional \var{bin} if true, tells the pickler to use the more
278 efficient binary pickle format, otherwise the \ASCII{} format is used
279 (this is the default).
281 \var{file} must have a \method{write()} method that accepts a single
282 string argument. It can thus be an open file object, a
283 \module{StringIO} object, or any other custom
284 object that meets this interface.
285 \end{classdesc}
287 \class{Pickler} objects define one (or two) public methods:
289 \begin{methoddesc}[Pickler]{dump}{obj}
290 Write a pickled representation of \var{obj} to the open file object
291 given in the constructor. Either the binary or \ASCII{} format will
292 be used, depending on the value of the \var{bin} flag passed to the
293 constructor.
294 \end{methoddesc}
296 \begin{methoddesc}[Pickler]{clear_memo}{}
297 Clears the pickler's ``memo''. The memo is the data structure that
298 remembers which objects the pickler has already seen, so that shared
299 or recursive objects pickled by reference and not by value. This
300 method is useful when re-using picklers.
302 \begin{notice}
303 Prior to Python 2.3, \method{clear_memo()} was only available on the
304 picklers created by \refmodule{cPickle}. In the \module{pickle} module,
305 picklers have an instance variable called \member{memo} which is a
306 Python dictionary. So to clear the memo for a \module{pickle} module
307 pickler, you could do the following:
309 \begin{verbatim}
310 mypickler.memo.clear()
311 \end{verbatim}
313 Code that does not need to support older versions of Python should
314 simply use \method{clear_memo()}.
315 \end{notice}
316 \end{methoddesc}
318 It is possible to make multiple calls to the \method{dump()} method of
319 the same \class{Pickler} instance. These must then be matched to the
320 same number of calls to the \method{load()} method of the
321 corresponding \class{Unpickler} instance. If the same object is
322 pickled by multiple \method{dump()} calls, the \method{load()} will
323 all yield references to the same object.\footnote{\emph{Warning}: this
324 is intended for pickling multiple objects without intervening
325 modifications to the objects or their parts. If you modify an object
326 and then pickle it again using the same \class{Pickler} instance, the
327 object is not pickled again --- a reference to it is pickled and the
328 \class{Unpickler} will return the old value, not the modified one.
329 There are two problems here: (1) detecting changes, and (2)
330 marshalling a minimal set of changes. Garbage Collection may also
331 become a problem here.}
333 \class{Unpickler} objects are defined as:
335 \begin{classdesc}{Unpickler}{file}
336 This takes a file-like object from which it will read a pickle data
337 stream. This class automatically determines whether the data stream
338 was written in binary mode or not, so it does not need a flag as in
339 the \class{Pickler} factory.
341 \var{file} must have two methods, a \method{read()} method that takes
342 an integer argument, and a \method{readline()} method that requires no
343 arguments. Both methods should return a string. Thus \var{file} can
344 be a file object opened for reading, a
345 \module{StringIO} object, or any other custom
346 object that meets this interface.
347 \end{classdesc}
349 \class{Unpickler} objects have one (or two) public methods:
351 \begin{methoddesc}[Unpickler]{load}{}
352 Read a pickled object representation from the open file object given
353 in the constructor, and return the reconstituted object hierarchy
354 specified therein.
355 \end{methoddesc}
357 \begin{methoddesc}[Unpickler]{noload}{}
358 This is just like \method{load()} except that it doesn't actually
359 create any objects. This is useful primarily for finding what's
360 called ``persistent ids'' that may be referenced in a pickle data
361 stream. See section~\ref{pickle-protocol} below for more details.
363 \strong{Note:} the \method{noload()} method is currently only
364 available on \class{Unpickler} objects created with the
365 \module{cPickle} module. \module{pickle} module \class{Unpickler}s do
366 not have the \method{noload()} method.
367 \end{methoddesc}
369 \subsection{What can be pickled and unpickled?}
371 The following types can be pickled:
373 \begin{itemize}
375 \item \code{None}, \code{True}, and \code{False}
377 \item integers, long integers, floating point numbers, complex numbers
379 \item normal and Unicode strings
381 \item tuples, lists, sets, and dictionaries containing only picklable objects
383 \item functions defined at the top level of a module
385 \item built-in functions defined at the top level of a module
387 \item classes that are defined at the top level of a module
389 \item instances of such classes whose \member{__dict__} or
390 \method{__setstate__()} is picklable (see
391 section~\ref{pickle-protocol} for details)
393 \end{itemize}
395 Attempts to pickle unpicklable objects will raise the
396 \exception{PicklingError} exception; when this happens, an unspecified
397 number of bytes may have already been written to the underlying file.
399 Note that functions (built-in and user-defined) are pickled by ``fully
400 qualified'' name reference, not by value. This means that only the
401 function name is pickled, along with the name of module the function
402 is defined in. Neither the function's code, nor any of its function
403 attributes are pickled. Thus the defining module must be importable
404 in the unpickling environment, and the module must contain the named
405 object, otherwise an exception will be raised.\footnote{The exception
406 raised will likely be an \exception{ImportError} or an
407 \exception{AttributeError} but it could be something else.}
409 Similarly, classes are pickled by named reference, so the same
410 restrictions in the unpickling environment apply. Note that none of
411 the class's code or data is pickled, so in the following example the
412 class attribute \code{attr} is not restored in the unpickling
413 environment:
415 \begin{verbatim}
416 class Foo:
417 attr = 'a class attr'
419 picklestring = pickle.dumps(Foo)
420 \end{verbatim}
422 These restrictions are why picklable functions and classes must be
423 defined in the top level of a module.
425 Similarly, when class instances are pickled, their class's code and
426 data are not pickled along with them. Only the instance data are
427 pickled. This is done on purpose, so you can fix bugs in a class or
428 add methods to the class and still load objects that were created with
429 an earlier version of the class. If you plan to have long-lived
430 objects that will see many versions of a class, it may be worthwhile
431 to put a version number in the objects so that suitable conversions
432 can be made by the class's \method{__setstate__()} method.
434 \subsection{The pickle protocol
435 \label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
437 This section describes the ``pickling protocol'' that defines the
438 interface between the pickler/unpickler and the objects that are being
439 serialized. This protocol provides a standard way for you to define,
440 customize, and control how your objects are serialized and
441 de-serialized. The description in this section doesn't cover specific
442 customizations that you can employ to make the unpickling environment
443 slightly safer from untrusted pickle data streams; see section~\ref{pickle-sub}
444 for more details.
446 \subsubsection{Pickling and unpickling normal class
447 instances\label{pickle-inst}}
449 When a pickled class instance is unpickled, its \method{__init__()}
450 method is normally \emph{not} invoked. If it is desirable that the
451 \method{__init__()} method be called on unpickling, an old-style class
452 can define a method \method{__getinitargs__()}, which should return a
453 \emph{tuple} containing the arguments to be passed to the class
454 constructor (i.e. \method{__init__()}). The
455 \method{__getinitargs__()} method is called at
456 pickle time; the tuple it returns is incorporated in the pickle for
457 the instance.
458 \withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
459 \withsubitem{(instance constructor)}{\ttindex{__init__()}}
461 \withsubitem{(copy protocol)}{\ttindex{__getnewargs__()}}
463 New-style types can provide a \method{__getnewargs__()} method that is
464 used for protocol 2. Implementing this method is needed if the type
465 establishes some internal invariants when the instance is created, or
466 if the memory allocation is affected by the values passed to the
467 \method{__new__()} method for the type (as it is for tuples and
468 strings). Instances of a new-style type \class{C} are created using
470 \begin{alltt}
471 obj = C.__new__(C, *\var{args})
472 \end{alltt}
474 where \var{args} is the result of calling \method{__getnewargs__()} on
475 the original object; if there is no \method{__getnewargs__()}, an
476 empty tuple is assumed.
478 \withsubitem{(copy protocol)}{
479 \ttindex{__getstate__()}\ttindex{__setstate__()}}
480 \withsubitem{(instance attribute)}{
481 \ttindex{__dict__}}
483 Classes can further influence how their instances are pickled; if the
484 class defines the method \method{__getstate__()}, it is called and the
485 return state is pickled as the contents for the instance, instead of
486 the contents of the instance's dictionary. If there is no
487 \method{__getstate__()} method, the instance's \member{__dict__} is
488 pickled.
490 Upon unpickling, if the class also defines the method
491 \method{__setstate__()}, it is called with the unpickled
492 state.\footnote{These methods can also be used to implement copying
493 class instances.} If there is no \method{__setstate__()} method, the
494 pickled state must be a dictionary and its items are assigned to the
495 new instance's dictionary. If a class defines both
496 \method{__getstate__()} and \method{__setstate__()}, the state object
497 needn't be a dictionary and these methods can do what they
498 want.\footnote{This protocol is also used by the shallow and deep
499 copying operations defined in the
500 \refmodule{copy} module.}
502 \begin{notice}[warning]
503 For new-style classes, if \method{__getstate__()} returns a false
504 value, the \method{__setstate__()} method will not be called.
505 \end{notice}
508 \subsubsection{Pickling and unpickling extension types}
510 When the \class{Pickler} encounters an object of a type it knows
511 nothing about --- such as an extension type --- it looks in two places
512 for a hint of how to pickle it. One alternative is for the object to
513 implement a \method{__reduce__()} method. If provided, at pickling
514 time \method{__reduce__()} will be called with no arguments, and it
515 must return either a string or a tuple.
517 If a string is returned, it names a global variable whose contents are
518 pickled as normal. The string returned by \method{__reduce__} should
519 be the object's local name relative to its module; the pickle module
520 searches the module namespace to determine the object's module.
522 When a tuple is returned, it must be between two and five elements
523 long. Optional elements can either be omitted, or \code{None} can be provided
524 as their value. The semantics of each element are:
526 \begin{itemize}
528 \item A callable object that will be called to create the initial
529 version of the object. The next element of the tuple will provide
530 arguments for this callable, and later elements provide additional
531 state information that will subsequently be used to fully reconstruct
532 the pickled date.
534 In the unpickling environment this object must be either a class, a
535 callable registered as a ``safe constructor'' (see below), or it must
536 have an attribute \member{__safe_for_unpickling__} with a true value.
537 Otherwise, an \exception{UnpicklingError} will be raised in the
538 unpickling environment. Note that as usual, the callable itself is
539 pickled by name.
541 \item A tuple of arguments for the callable object, or \code{None}.
542 \deprecated{2.3}{If this item is \code{None}, then instead of calling
543 the callable directly, its \method{__basicnew__()} method is called
544 without arguments; this method should also return the unpickled
545 object. Providing \code{None} is deprecated, however; return a
546 tuple of arguments instead.}
548 \item Optionally, the object's state, which will be passed to
549 the object's \method{__setstate__()} method as described in
550 section~\ref{pickle-inst}. If the object has no
551 \method{__setstate__()} method, then, as above, the value must
552 be a dictionary and it will be added to the object's
553 \member{__dict__}.
555 \item Optionally, an iterator (and not a sequence) yielding successive
556 list items. These list items will be pickled, and appended to the
557 object using either \code{obj.append(\var{item})} or
558 \code{obj.extend(\var{list_of_items})}. This is primarily used for
559 list subclasses, but may be used by other classes as long as they have
560 \method{append()} and \method{extend()} methods with the appropriate
561 signature. (Whether \method{append()} or \method{extend()} is used
562 depends on which pickle protocol version is used as well as the number
563 of items to append, so both must be supported.)
565 \item Optionally, an iterator (not a sequence)
566 yielding successive dictionary items, which should be tuples of the
567 form \code{(\var{key}, \var{value})}. These items will be pickled
568 and stored to the object using \code{obj[\var{key}] = \var{value}}.
569 This is primarily used for dictionary subclasses, but may be used by
570 other classes as long as they implement \method{__setitem__}.
572 \end{itemize}
574 It is sometimes useful to know the protocol version when implementing
575 \method{__reduce__}. This can be done by implementing a method named
576 \method{__reduce_ex__} instead of \method{__reduce__}.
577 \method{__reduce_ex__}, when it exists, is called in preference over
578 \method{__reduce__} (you may still provide \method{__reduce__} for
579 backwards compatibility). The \method{__reduce_ex__} method will be
580 called with a single integer argument, the protocol version.
582 The \class{object} class implements both \method{__reduce__} and
583 \method{__reduce_ex__}; however, if a subclass overrides
584 \method{__reduce__} but not \method{__reduce_ex__}, the
585 \method{__reduce_ex__} implementation detects this and calls
586 \method{__reduce__}.
588 An alternative to implementing a \method{__reduce__()} method on the
589 object to be pickled, is to register the callable with the
590 \refmodule[copyreg]{copy_reg} module. This module provides a way
591 for programs to register ``reduction functions'' and constructors for
592 user-defined types. Reduction functions have the same semantics and
593 interface as the \method{__reduce__()} method described above, except
594 that they are called with a single argument, the object to be pickled.
596 The registered constructor is deemed a ``safe constructor'' for purposes
597 of unpickling as described above.
600 \subsubsection{Pickling and unpickling external objects}
602 For the benefit of object persistence, the \module{pickle} module
603 supports the notion of a reference to an object outside the pickled
604 data stream. Such objects are referenced by a ``persistent id'',
605 which is just an arbitrary string of printable \ASCII{} characters.
606 The resolution of such names is not defined by the \module{pickle}
607 module; it will delegate this resolution to user defined functions on
608 the pickler and unpickler.\footnote{The actual mechanism for
609 associating these user defined functions is slightly different for
610 \module{pickle} and \module{cPickle}. The description given here
611 works the same for both implementations. Users of the \module{pickle}
612 module could also use subclassing to effect the same results,
613 overriding the \method{persistent_id()} and \method{persistent_load()}
614 methods in the derived classes.}
616 To define external persistent id resolution, you need to set the
617 \member{persistent_id} attribute of the pickler object and the
618 \member{persistent_load} attribute of the unpickler object.
620 To pickle objects that have an external persistent id, the pickler
621 must have a custom \function{persistent_id()} method that takes an
622 object as an argument and returns either \code{None} or the persistent
623 id for that object. When \code{None} is returned, the pickler simply
624 pickles the object as normal. When a persistent id string is
625 returned, the pickler will pickle that string, along with a marker
626 so that the unpickler will recognize the string as a persistent id.
628 To unpickle external objects, the unpickler must have a custom
629 \function{persistent_load()} function that takes a persistent id
630 string and returns the referenced object.
632 Here's a silly example that \emph{might} shed more light:
634 \begin{verbatim}
635 import pickle
636 from cStringIO import StringIO
638 src = StringIO()
639 p = pickle.Pickler(src)
641 def persistent_id(obj):
642 if hasattr(obj, 'x'):
643 return 'the value %d' % obj.x
644 else:
645 return None
647 p.persistent_id = persistent_id
649 class Integer:
650 def __init__(self, x):
651 self.x = x
652 def __str__(self):
653 return 'My name is integer %d' % self.x
655 i = Integer(7)
656 print i
657 p.dump(i)
659 datastream = src.getvalue()
660 print repr(datastream)
661 dst = StringIO(datastream)
663 up = pickle.Unpickler(dst)
665 class FancyInteger(Integer):
666 def __str__(self):
667 return 'I am the integer %d' % self.x
669 def persistent_load(persid):
670 if persid.startswith('the value '):
671 value = int(persid.split()[2])
672 return FancyInteger(value)
673 else:
674 raise pickle.UnpicklingError, 'Invalid persistent id'
676 up.persistent_load = persistent_load
678 j = up.load()
679 print j
680 \end{verbatim}
682 In the \module{cPickle} module, the unpickler's
683 \member{persistent_load} attribute can also be set to a Python
684 list, in which case, when the unpickler reaches a persistent id, the
685 persistent id string will simply be appended to this list. This
686 functionality exists so that a pickle data stream can be ``sniffed''
687 for object references without actually instantiating all the objects
688 in a pickle.\footnote{We'll leave you with the image of Guido and Jim
689 sitting around sniffing pickles in their living rooms.} Setting
690 \member{persistent_load} to a list is usually used in conjunction with
691 the \method{noload()} method on the Unpickler.
693 % BAW: Both pickle and cPickle support something called
694 % inst_persistent_id() which appears to give unknown types a second
695 % shot at producing a persistent id. Since Jim Fulton can't remember
696 % why it was added or what it's for, I'm leaving it undocumented.
698 \subsection{Subclassing Unpicklers \label{pickle-sub}}
700 By default, unpickling will import any class that it finds in the
701 pickle data. You can control exactly what gets unpickled and what
702 gets called by customizing your unpickler. Unfortunately, exactly how
703 you do this is different depending on whether you're using
704 \module{pickle} or \module{cPickle}.\footnote{A word of caution: the
705 mechanisms described here use internal attributes and methods, which
706 are subject to change in future versions of Python. We intend to
707 someday provide a common interface for controlling this behavior,
708 which will work in either \module{pickle} or \module{cPickle}.}
710 In the \module{pickle} module, you need to derive a subclass from
711 \class{Unpickler}, overriding the \method{load_global()}
712 method. \method{load_global()} should read two lines from the pickle
713 data stream where the first line will the name of the module
714 containing the class and the second line will be the name of the
715 instance's class. It then looks up the class, possibly importing the
716 module and digging out the attribute, then it appends what it finds to
717 the unpickler's stack. Later on, this class will be assigned to the
718 \member{__class__} attribute of an empty class, as a way of magically
719 creating an instance without calling its class's \method{__init__()}.
720 Your job (should you choose to accept it), would be to have
721 \method{load_global()} push onto the unpickler's stack, a known safe
722 version of any class you deem safe to unpickle. It is up to you to
723 produce such a class. Or you could raise an error if you want to
724 disallow all unpickling of instances. If this sounds like a hack,
725 you're right. Refer to the source code to make this work.
727 Things are a little cleaner with \module{cPickle}, but not by much.
728 To control what gets unpickled, you can set the unpickler's
729 \member{find_global} attribute to a function or \code{None}. If it is
730 \code{None} then any attempts to unpickle instances will raise an
731 \exception{UnpicklingError}. If it is a function,
732 then it should accept a module name and a class name, and return the
733 corresponding class object. It is responsible for looking up the
734 class and performing any necessary imports, and it may raise an
735 error to prevent instances of the class from being unpickled.
737 The moral of the story is that you should be really careful about the
738 source of the strings your application unpickles.
740 \subsection{Example \label{pickle-example}}
742 Here's a simple example of how to modify pickling behavior for a
743 class. The \class{TextReader} class opens a text file, and returns
744 the line number and line contents each time its \method{readline()}
745 method is called. If a \class{TextReader} instance is pickled, all
746 attributes \emph{except} the file object member are saved. When the
747 instance is unpickled, the file is reopened, and reading resumes from
748 the last location. The \method{__setstate__()} and
749 \method{__getstate__()} methods are used to implement this behavior.
751 \begin{verbatim}
752 class TextReader:
753 """Print and number lines in a text file."""
754 def __init__(self, file):
755 self.file = file
756 self.fh = open(file)
757 self.lineno = 0
759 def readline(self):
760 self.lineno = self.lineno + 1
761 line = self.fh.readline()
762 if not line:
763 return None
764 if line.endswith("\n"):
765 line = line[:-1]
766 return "%d: %s" % (self.lineno, line)
768 def __getstate__(self):
769 odict = self.__dict__.copy() # copy the dict since we change it
770 del odict['fh'] # remove filehandle entry
771 return odict
773 def __setstate__(self,dict):
774 fh = open(dict['file']) # reopen file
775 count = dict['lineno'] # read from file...
776 while count: # until line count is restored
777 fh.readline()
778 count = count - 1
779 self.__dict__.update(dict) # update attributes
780 self.fh = fh # save the file object
781 \end{verbatim}
783 A sample usage might be something like this:
785 \begin{verbatim}
786 >>> import TextReader
787 >>> obj = TextReader.TextReader("TextReader.py")
788 >>> obj.readline()
789 '1: #!/usr/local/bin/python'
790 >>> # (more invocations of obj.readline() here)
791 ... obj.readline()
792 '7: class TextReader:'
793 >>> import pickle
794 >>> pickle.dump(obj,open('save.p','w'))
795 \end{verbatim}
797 If you want to see that \refmodule{pickle} works across Python
798 processes, start another Python session, before continuing. What
799 follows can happen from either the same process or a new process.
801 \begin{verbatim}
802 >>> import pickle
803 >>> reader = pickle.load(open('save.p'))
804 >>> reader.readline()
805 '8: "Print and number lines in a text file."'
806 \end{verbatim}
809 \begin{seealso}
810 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
811 registration for extension types.}
813 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
815 \seemodule{copy}{Shallow and deep object copying.}
817 \seemodule{marshal}{High-performance serialization of built-in types.}
818 \end{seealso}
821 \section{\module{cPickle} --- A faster \module{pickle}}
823 \declaremodule{builtin}{cPickle}
824 \modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
825 \moduleauthor{Jim Fulton}{jim@zope.com}
826 \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
828 The \module{cPickle} module supports serialization and
829 de-serialization of Python objects, providing an interface and
830 functionality nearly identical to the
831 \refmodule{pickle}\refstmodindex{pickle} module. There are several
832 differences, the most important being performance and subclassability.
834 First, \module{cPickle} can be up to 1000 times faster than
835 \module{pickle} because the former is implemented in C. Second, in
836 the \module{cPickle} module the callables \function{Pickler()} and
837 \function{Unpickler()} are functions, not classes. This means that
838 you cannot use them to derive custom pickling and unpickling
839 subclasses. Most applications have no need for this functionality and
840 should benefit from the greatly improved performance of the
841 \module{cPickle} module.
843 The pickle data stream produced by \module{pickle} and
844 \module{cPickle} are identical, so it is possible to use
845 \module{pickle} and \module{cPickle} interchangeably with existing
846 pickles.\footnote{Since the pickle data format is actually a tiny
847 stack-oriented programming language, and some freedom is taken in the
848 encodings of certain objects, it is possible that the two modules
849 produce different data streams for the same input objects. However it
850 is guaranteed that they will always be able to read each other's
851 data streams.}
853 There are additional minor differences in API between \module{cPickle}
854 and \module{pickle}, however for most applications, they are
855 interchangeable. More documentation is provided in the
856 \module{pickle} module documentation, which
857 includes a list of the documented differences.