This commit was manufactured by cvs2svn to create tag 'r234c1'.
[python/dscho.git] / Doc / lib / libpickle.tex
blob50bc66f18e7d29a6cb1b13301fb9d438d4bcd8c6
1 \section{\module{pickle} --- Python object serialization}
3 \declaremodule{standard}{pickle}
4 \modulesynopsis{Convert Python objects to streams of bytes and back.}
5 % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
6 % Rewritten by Barry Warsaw <barry@zope.com>
8 \index{persistence}
9 \indexii{persistent}{objects}
10 \indexii{serializing}{objects}
11 \indexii{marshalling}{objects}
12 \indexii{flattening}{objects}
13 \indexii{pickling}{objects}
15 The \module{pickle} module implements a fundamental, but powerful
16 algorithm for serializing and de-serializing a Python object
17 structure. ``Pickling'' is the process whereby a Python object
18 hierarchy is converted into a byte stream, and ``unpickling'' is the
19 inverse operation, whereby a byte stream is converted back into an
20 object hierarchy. Pickling (and unpickling) is alternatively known as
21 ``serialization'', ``marshalling,''\footnote{Don't confuse this with
22 the \refmodule{marshal} module} or ``flattening'',
23 however, to avoid confusion, the terms used here are ``pickling'' and
24 ``unpickling''.
26 This documentation describes both the \module{pickle} module and the
27 \refmodule{cPickle} module.
29 \subsection{Relationship to other Python modules}
31 The \module{pickle} module has an optimized cousin called the
32 \module{cPickle} module. As its name implies, \module{cPickle} is
33 written in C, so it can be up to 1000 times faster than
34 \module{pickle}. However it does not support subclassing of the
35 \function{Pickler()} and \function{Unpickler()} classes, because in
36 \module{cPickle} these are functions, not classes. Most applications
37 have no need for this functionality, and can benefit from the improved
38 performance of \module{cPickle}. Other than that, the interfaces of
39 the two modules are nearly identical; the common interface is
40 described in this manual and differences are pointed out where
41 necessary. In the following discussions, we use the term ``pickle''
42 to collectively describe the \module{pickle} and
43 \module{cPickle} modules.
45 The data streams the two modules produce are guaranteed to be
46 interchangeable.
48 Python has a more primitive serialization module called
49 \refmodule{marshal}, but in general
50 \module{pickle} should always be the preferred way to serialize Python
51 objects. \module{marshal} exists primarily to support Python's
52 \file{.pyc} files.
54 The \module{pickle} module differs from \refmodule{marshal} several
55 significant ways:
57 \begin{itemize}
59 \item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
63 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
74 \item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
80 \item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
88 \end{itemize}
90 \begin{notice}[warning]
91 The \module{pickle} module is not intended to be secure against
92 erroneous or maliciously constructed data. Never unpickle data
93 received from an untrusted or unauthenticated source.
94 \end{notice}
96 Note that serialization is a more primitive notion than persistence;
97 although
98 \module{pickle} reads and writes file objects, it does not handle the
99 issue of naming persistent objects, nor the (even more complicated)
100 issue of concurrent access to persistent objects. The \module{pickle}
101 module can transform a complex object into a byte stream and it can
102 transform the byte stream into an object with the same internal
103 structure. Perhaps the most obvious thing to do with these byte
104 streams is to write them onto a file, but it is also conceivable to
105 send them across a network or store them in a database. The module
106 \refmodule{shelve} provides a simple interface
107 to pickle and unpickle objects on DBM-style database files.
109 \subsection{Data stream format}
111 The data format used by \module{pickle} is Python-specific. This has
112 the advantage that there are no restrictions imposed by external
113 standards such as XDR\index{XDR}\index{External Data Representation}
114 (which can't represent pointer sharing); however it means that
115 non-Python programs may not be able to reconstruct pickled Python
116 objects.
118 By default, the \module{pickle} data format uses a printable \ASCII{}
119 representation. This is slightly more voluminous than a binary
120 representation. The big advantage of using printable \ASCII{} (and of
121 some other characteristics of \module{pickle}'s representation) is that
122 for debugging or recovery purposes it is possible for a human to read
123 the pickled file with a standard text editor.
125 There are currently 3 different protocols which can be used for pickling.
127 \begin{itemize}
129 \item Protocol version 0 is the original ASCII protocol and is backwards
130 compatible with earlier versions of Python.
132 \item Protocol version 1 is the old binary format which is also compatible
133 with earlier versions of Python.
135 \item Protocol version 2 was introduced in Python 2.3. It provides
136 much more efficient pickling of new-style classes.
138 \end{itemize}
140 Refer to PEP 307 for more information.
142 If a \var{protocol} is not specified, protocol 0 is used.
143 If \var{protocol} is specified as a negative value
144 or \constant{HIGHEST_PROTOCOL},
145 the highest protocol version available will be used.
147 \versionchanged[The \var{bin} parameter is deprecated and only provided
148 for backwards compatibility. You should use the \var{protocol}
149 parameter instead]{2.3}
151 A binary format, which is slightly more efficient, can be chosen by
152 specifying a true value for the \var{bin} argument to the
153 \class{Pickler} constructor or the \function{dump()} and \function{dumps()}
154 functions. A \var{protocol} version >= 1 implies use of a binary format.
156 \subsection{Usage}
158 To serialize an object hierarchy, you first create a pickler, then you
159 call the pickler's \method{dump()} method. To de-serialize a data
160 stream, you first create an unpickler, then you call the unpickler's
161 \method{load()} method. The \module{pickle} module provides the
162 following constant:
164 \begin{datadesc}{HIGHEST_PROTOCOL}
165 The highest protocol version available. This value can be passed
166 as a \var{protocol} value.
167 \versionadded{2.3}
168 \end{datadesc}
170 The \module{pickle} module provides the
171 following functions to make this process more convenient:
173 \begin{funcdesc}{dump}{object, file\optional{, protocol\optional{, bin}}}
174 Write a pickled representation of \var{object} to the open file object
175 \var{file}. This is equivalent to
176 \code{Pickler(\var{file}, \var{protocol}, \var{bin}).dump(\var{object})}.
178 If the \var{protocol} parameter is ommitted, protocol 0 is used.
179 If \var{protocol} is specified as a negative value
180 or \constant{HIGHEST_PROTOCOL},
181 the highest protocol version will be used.
183 \versionchanged[The \var{protocol} parameter was added.
184 The \var{bin} parameter is deprecated and only provided
185 for backwards compatibility. You should use the \var{protocol}
186 parameter instead]{2.3}
188 If the optional \var{bin} argument is true, the binary pickle format
189 is used; otherwise the (less efficient) text pickle format is used
190 (for backwards compatibility, this is the default).
192 \var{file} must have a \method{write()} method that accepts a single
193 string argument. It can thus be a file object opened for writing, a
194 \refmodule{StringIO} object, or any other custom
195 object that meets this interface.
196 \end{funcdesc}
198 \begin{funcdesc}{load}{file}
199 Read a string from the open file object \var{file} and interpret it as
200 a pickle data stream, reconstructing and returning the original object
201 hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
203 \var{file} must have two methods, a \method{read()} method that takes
204 an integer argument, and a \method{readline()} method that requires no
205 arguments. Both methods should return a string. Thus \var{file} can
206 be a file object opened for reading, a
207 \module{StringIO} object, or any other custom
208 object that meets this interface.
210 This function automatically determines whether the data stream was
211 written in binary mode or not.
212 \end{funcdesc}
214 \begin{funcdesc}{dumps}{object\optional{, protocol\optional{, bin}}}
215 Return the pickled representation of the object as a string, instead
216 of writing it to a file.
218 If the \var{protocol} parameter is ommitted, protocol 0 is used.
219 If \var{protocol} is specified as a negative value
220 or \constant{HIGHEST_PROTOCOL},
221 the highest protocol version will be used.
223 \versionchanged[The \var{protocol} parameter was added.
224 The \var{bin} parameter is deprecated and only provided
225 for backwards compatibility. You should use the \var{protocol}
226 parameter instead]{2.3}
228 If the optional \var{bin} argument is
229 true, the binary pickle format is used; otherwise the (less efficient)
230 text pickle format is used (this is the default).
231 \end{funcdesc}
233 \begin{funcdesc}{loads}{string}
234 Read a pickled object hierarchy from a string. Characters in the
235 string past the pickled object's representation are ignored.
236 \end{funcdesc}
238 The \module{pickle} module also defines three exceptions:
240 \begin{excdesc}{PickleError}
241 A common base class for the other exceptions defined below. This
242 inherits from \exception{Exception}.
243 \end{excdesc}
245 \begin{excdesc}{PicklingError}
246 This exception is raised when an unpicklable object is passed to
247 the \method{dump()} method.
248 \end{excdesc}
250 \begin{excdesc}{UnpicklingError}
251 This exception is raised when there is a problem unpickling an object.
252 Note that other exceptions may also be raised during unpickling,
253 including (but not necessarily limited to) \exception{AttributeError},
254 \exception{EOFError}, \exception{ImportError}, and \exception{IndexError}.
255 \end{excdesc}
257 The \module{pickle} module also exports two callables,\footnote{In the
258 \module{pickle} module these callables are classes, which you could
259 subclass to customize the behavior. However, in the
260 \refmodule{cPickle} module these callables are factory functions and
261 so cannot be subclassed. One common reason to subclass is to control
262 what objects can actually be unpickled. See section~\ref{pickle-sub}
263 for more details.} \class{Pickler} and \class{Unpickler}:
265 \begin{classdesc}{Pickler}{file\optional{, protocol\optional{, bin}}}
266 This takes a file-like object to which it will write a pickle data
267 stream.
269 If the \var{protocol} parameter is ommitted, protocol 0 is used.
270 If \var{protocol} is specified as a negative value,
271 the highest protocol version will be used.
273 \versionchanged[The \var{bin} parameter is deprecated and only provided
274 for backwards compatibility. You should use the \var{protocol}
275 parameter instead]{2.3}
277 Optional \var{bin} if true, tells the pickler to use the more
278 efficient binary pickle format, otherwise the \ASCII{} format is used
279 (this is the default).
281 \var{file} must have a \method{write()} method that accepts a single
282 string argument. It can thus be an open file object, a
283 \module{StringIO} object, or any other custom
284 object that meets this interface.
285 \end{classdesc}
287 \class{Pickler} objects define one (or two) public methods:
289 \begin{methoddesc}[Pickler]{dump}{object}
290 Write a pickled representation of \var{object} to the open file object
291 given in the constructor. Either the binary or \ASCII{} format will
292 be used, depending on the value of the \var{bin} flag passed to the
293 constructor.
294 \end{methoddesc}
296 \begin{methoddesc}[Pickler]{clear_memo}{}
297 Clears the pickler's ``memo''. The memo is the data structure that
298 remembers which objects the pickler has already seen, so that shared
299 or recursive objects pickled by reference and not by value. This
300 method is useful when re-using picklers.
302 \begin{notice}
303 Prior to Python 2.3, \method{clear_memo()} was only available on the
304 picklers created by \refmodule{cPickle}. In the \module{pickle} module,
305 picklers have an instance variable called \member{memo} which is a
306 Python dictionary. So to clear the memo for a \module{pickle} module
307 pickler, you could do the following:
309 \begin{verbatim}
310 mypickler.memo.clear()
311 \end{verbatim}
313 Code that does not need to support older versions of Python should
314 simply use \method{clear_memo()}.
315 \end{notice}
316 \end{methoddesc}
318 It is possible to make multiple calls to the \method{dump()} method of
319 the same \class{Pickler} instance. These must then be matched to the
320 same number of calls to the \method{load()} method of the
321 corresponding \class{Unpickler} instance. If the same object is
322 pickled by multiple \method{dump()} calls, the \method{load()} will
323 all yield references to the same object\footnote{\emph{Warning}: this
324 is intended for pickling multiple objects without intervening
325 modifications to the objects or their parts. If you modify an object
326 and then pickle it again using the same \class{Pickler} instance, the
327 object is not pickled again --- a reference to it is pickled and the
328 \class{Unpickler} will return the old value, not the modified one.
329 There are two problems here: (1) detecting changes, and (2)
330 marshalling a minimal set of changes. Garbage Collection may also
331 become a problem here.}.
333 \class{Unpickler} objects are defined as:
335 \begin{classdesc}{Unpickler}{file}
336 This takes a file-like object from which it will read a pickle data
337 stream. This class automatically determines whether the data stream
338 was written in binary mode or not, so it does not need a flag as in
339 the \class{Pickler} factory.
341 \var{file} must have two methods, a \method{read()} method that takes
342 an integer argument, and a \method{readline()} method that requires no
343 arguments. Both methods should return a string. Thus \var{file} can
344 be a file object opened for reading, a
345 \module{StringIO} object, or any other custom
346 object that meets this interface.
347 \end{classdesc}
349 \class{Unpickler} objects have one (or two) public methods:
351 \begin{methoddesc}[Unpickler]{load}{}
352 Read a pickled object representation from the open file object given
353 in the constructor, and return the reconstituted object hierarchy
354 specified therein.
355 \end{methoddesc}
357 \begin{methoddesc}[Unpickler]{noload}{}
358 This is just like \method{load()} except that it doesn't actually
359 create any objects. This is useful primarily for finding what's
360 called ``persistent ids'' that may be referenced in a pickle data
361 stream. See section~\ref{pickle-protocol} below for more details.
363 \strong{Note:} the \method{noload()} method is currently only
364 available on \class{Unpickler} objects created with the
365 \module{cPickle} module. \module{pickle} module \class{Unpickler}s do
366 not have the \method{noload()} method.
367 \end{methoddesc}
369 \subsection{What can be pickled and unpickled?}
371 The following types can be pickled:
373 \begin{itemize}
375 \item \code{None}, \code{True}, and \code{False}
377 \item integers, long integers, floating point numbers, complex numbers
379 \item normal and Unicode strings
381 \item tuples, lists, and dictionaries containing only picklable objects
383 \item functions defined at the top level of a module
385 \item built-in functions defined at the top level of a module
387 \item classes that are defined at the top level of a module
389 \item instances of such classes whose \member{__dict__} or
390 \method{__setstate__()} is picklable (see
391 section~\ref{pickle-protocol} for details)
393 \end{itemize}
395 Attempts to pickle unpicklable objects will raise the
396 \exception{PicklingError} exception; when this happens, an unspecified
397 number of bytes may have already been written to the underlying file.
399 Note that functions (built-in and user-defined) are pickled by ``fully
400 qualified'' name reference, not by value. This means that only the
401 function name is pickled, along with the name of module the function
402 is defined in. Neither the function's code, nor any of its function
403 attributes are pickled. Thus the defining module must be importable
404 in the unpickling environment, and the module must contain the named
405 object, otherwise an exception will be raised\footnote{The exception
406 raised will likely be an \exception{ImportError} or an
407 \exception{AttributeError} but it could be something else.}.
409 Similarly, classes are pickled by named reference, so the same
410 restrictions in the unpickling environment apply. Note that none of
411 the class's code or data is pickled, so in the following example the
412 class attribute \code{attr} is not restored in the unpickling
413 environment:
415 \begin{verbatim}
416 class Foo:
417 attr = 'a class attr'
419 picklestring = pickle.dumps(Foo)
420 \end{verbatim}
422 These restrictions are why picklable functions and classes must be
423 defined in the top level of a module.
425 Similarly, when class instances are pickled, their class's code and
426 data are not pickled along with them. Only the instance data are
427 pickled. This is done on purpose, so you can fix bugs in a class or
428 add methods to the class and still load objects that were created with
429 an earlier version of the class. If you plan to have long-lived
430 objects that will see many versions of a class, it may be worthwhile
431 to put a version number in the objects so that suitable conversions
432 can be made by the class's \method{__setstate__()} method.
434 \subsection{The pickle protocol
435 \label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
437 This section describes the ``pickling protocol'' that defines the
438 interface between the pickler/unpickler and the objects that are being
439 serialized. This protocol provides a standard way for you to define,
440 customize, and control how your objects are serialized and
441 de-serialized. The description in this section doesn't cover specific
442 customizations that you can employ to make the unpickling environment
443 slightly safer from untrusted pickle data streams; see section~\ref{pickle-sub}
444 for more details.
446 \subsubsection{Pickling and unpickling normal class
447 instances\label{pickle-inst}}
449 When a pickled class instance is unpickled, its \method{__init__()}
450 method is normally \emph{not} invoked. If it is desirable that the
451 \method{__init__()} method be called on unpickling, an old-style class
452 can define a method \method{__getinitargs__()}, which should return a
453 \emph{tuple} containing the arguments to be passed to the class
454 constructor (i.e. \method{__init__()}). The
455 \method{__getinitargs__()} method is called at
456 pickle time; the tuple it returns is incorporated in the pickle for
457 the instance.
458 \withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
459 \withsubitem{(instance constructor)}{\ttindex{__init__()}}
461 \withsubitem{(copy protocol)}{\ttindex{__getnewargs__()}}
463 New-style types can provide a \method{__getnewargs__()} method that is
464 used for protocol 2. Implementing this method is needed if the type
465 establishes some internal invariants when the instance is created, or
466 if the memory allocation is affected by the values passed to the
467 \method{__new__()} method for the type (as it is for tuples and
468 strings). Instances of a new-style type \class{C} are created using
470 \begin{alltt}
471 obj = C.__new__(C, *\var{args})
472 \end{alltt}
474 where \var{args} is the result of calling \method{__getnewargs__()} on
475 the original object; if there is no \method{__getnewargs__()}, an
476 empty tuple is assumed.
478 \withsubitem{(copy protocol)}{
479 \ttindex{__getstate__()}\ttindex{__setstate__()}}
480 \withsubitem{(instance attribute)}{
481 \ttindex{__dict__}}
483 Classes can further influence how their instances are pickled; if the
484 class defines the method \method{__getstate__()}, it is called and the
485 return state is pickled as the contents for the instance, instead of
486 the contents of the instance's dictionary. If there is no
487 \method{__getstate__()} method, the instance's \member{__dict__} is
488 pickled.
490 Upon unpickling, if the class also defines the method
491 \method{__setstate__()}, it is called with the unpickled
492 state\footnote{These methods can also be used to implement copying
493 class instances.}. If there is no \method{__setstate__()} method, the
494 pickled state must be a dictionary and its items are assigned to the
495 new instance's dictionary. If a class defines both
496 \method{__getstate__()} and \method{__setstate__()}, the state object
497 needn't be a dictionary and these methods can do what they
498 want.\footnote{This protocol is also used by the shallow and deep
499 copying operations defined in the
500 \refmodule{copy} module.}
502 \begin{notice}[warning]
503 For new-style classes, if \method{__getstate__()} returns a false
504 value, the \method{__setstate__()} method will not be called.
505 \end{notice}
508 \subsubsection{Pickling and unpickling extension types}
510 When the \class{Pickler} encounters an object of a type it knows
511 nothing about --- such as an extension type --- it looks in two places
512 for a hint of how to pickle it. One alternative is for the object to
513 implement a \method{__reduce__()} method. If provided, at pickling
514 time \method{__reduce__()} will be called with no arguments, and it
515 must return either a string or a tuple.
517 If a string is returned, it names a global variable whose contents are
518 pickled as normal. When a tuple is returned, it must be of length two
519 or three, with the following semantics:
521 \begin{itemize}
523 \item A callable object, which in the unpickling environment must be
524 either a class, a callable registered as a ``safe constructor''
525 (see below), or it must have an attribute
526 \member{__safe_for_unpickling__} with a true value. Otherwise,
527 an \exception{UnpicklingError} will be raised in the unpickling
528 environment. Note that as usual, the callable itself is pickled
529 by name.
531 \item A tuple of arguments for the callable object, or \code{None}.
532 \deprecated{2.3}{Use the tuple of arguments instead}
534 \item Optionally, the object's state, which will be passed to
535 the object's \method{__setstate__()} method as described in
536 section~\ref{pickle-inst}. If the object has no
537 \method{__setstate__()} method, then, as above, the value must
538 be a dictionary and it will be added to the object's
539 \member{__dict__}.
541 \end{itemize}
543 Upon unpickling, the callable will be called (provided that it meets
544 the above criteria), passing in the tuple of arguments; it should
545 return the unpickled object.
547 If the second item was \code{None}, then instead of calling the
548 callable directly, its \method{__basicnew__()} method is called
549 without arguments. It should also return the unpickled object.
551 \deprecated{2.3}{Use the tuple of arguments instead}
553 An alternative to implementing a \method{__reduce__()} method on the
554 object to be pickled, is to register the callable with the
555 \refmodule[copyreg]{copy_reg} module. This module provides a way
556 for programs to register ``reduction functions'' and constructors for
557 user-defined types. Reduction functions have the same semantics and
558 interface as the \method{__reduce__()} method described above, except
559 that they are called with a single argument, the object to be pickled.
561 The registered constructor is deemed a ``safe constructor'' for purposes
562 of unpickling as described above.
564 \subsubsection{Pickling and unpickling external objects}
566 For the benefit of object persistence, the \module{pickle} module
567 supports the notion of a reference to an object outside the pickled
568 data stream. Such objects are referenced by a ``persistent id'',
569 which is just an arbitrary string of printable \ASCII{} characters.
570 The resolution of such names is not defined by the \module{pickle}
571 module; it will delegate this resolution to user defined functions on
572 the pickler and unpickler\footnote{The actual mechanism for
573 associating these user defined functions is slightly different for
574 \module{pickle} and \module{cPickle}. The description given here
575 works the same for both implementations. Users of the \module{pickle}
576 module could also use subclassing to effect the same results,
577 overriding the \method{persistent_id()} and \method{persistent_load()}
578 methods in the derived classes.}.
580 To define external persistent id resolution, you need to set the
581 \member{persistent_id} attribute of the pickler object and the
582 \member{persistent_load} attribute of the unpickler object.
584 To pickle objects that have an external persistent id, the pickler
585 must have a custom \function{persistent_id()} method that takes an
586 object as an argument and returns either \code{None} or the persistent
587 id for that object. When \code{None} is returned, the pickler simply
588 pickles the object as normal. When a persistent id string is
589 returned, the pickler will pickle that string, along with a marker
590 so that the unpickler will recognize the string as a persistent id.
592 To unpickle external objects, the unpickler must have a custom
593 \function{persistent_load()} function that takes a persistent id
594 string and returns the referenced object.
596 Here's a silly example that \emph{might} shed more light:
598 \begin{verbatim}
599 import pickle
600 from cStringIO import StringIO
602 src = StringIO()
603 p = pickle.Pickler(src)
605 def persistent_id(obj):
606 if hasattr(obj, 'x'):
607 return 'the value %d' % obj.x
608 else:
609 return None
611 p.persistent_id = persistent_id
613 class Integer:
614 def __init__(self, x):
615 self.x = x
616 def __str__(self):
617 return 'My name is integer %d' % self.x
619 i = Integer(7)
620 print i
621 p.dump(i)
623 datastream = src.getvalue()
624 print repr(datastream)
625 dst = StringIO(datastream)
627 up = pickle.Unpickler(dst)
629 class FancyInteger(Integer):
630 def __str__(self):
631 return 'I am the integer %d' % self.x
633 def persistent_load(persid):
634 if persid.startswith('the value '):
635 value = int(persid.split()[2])
636 return FancyInteger(value)
637 else:
638 raise pickle.UnpicklingError, 'Invalid persistent id'
640 up.persistent_load = persistent_load
642 j = up.load()
643 print j
644 \end{verbatim}
646 In the \module{cPickle} module, the unpickler's
647 \member{persistent_load} attribute can also be set to a Python
648 list, in which case, when the unpickler reaches a persistent id, the
649 persistent id string will simply be appended to this list. This
650 functionality exists so that a pickle data stream can be ``sniffed''
651 for object references without actually instantiating all the objects
652 in a pickle\footnote{We'll leave you with the image of Guido and Jim
653 sitting around sniffing pickles in their living rooms.}. Setting
654 \member{persistent_load} to a list is usually used in conjunction with
655 the \method{noload()} method on the Unpickler.
657 % BAW: Both pickle and cPickle support something called
658 % inst_persistent_id() which appears to give unknown types a second
659 % shot at producing a persistent id. Since Jim Fulton can't remember
660 % why it was added or what it's for, I'm leaving it undocumented.
662 \subsection{Subclassing Unpicklers \label{pickle-sub}}
664 By default, unpickling will import any class that it finds in the
665 pickle data. You can control exactly what gets unpickled and what
666 gets called by customizing your unpickler. Unfortunately, exactly how
667 you do this is different depending on whether you're using
668 \module{pickle} or \module{cPickle}.\footnote{A word of caution: the
669 mechanisms described here use internal attributes and methods, which
670 are subject to change in future versions of Python. We intend to
671 someday provide a common interface for controlling this behavior,
672 which will work in either \module{pickle} or \module{cPickle}.}.
674 In the \module{pickle} module, you need to derive a subclass from
675 \class{Unpickler}, overriding the \method{load_global()}
676 method. \method{load_global()} should read two lines from the pickle
677 data stream where the first line will the name of the module
678 containing the class and the second line will be the name of the
679 instance's class. It then looks up the class, possibly importing the
680 module and digging out the attribute, then it appends what it finds to
681 the unpickler's stack. Later on, this class will be assigned to the
682 \member{__class__} attribute of an empty class, as a way of magically
683 creating an instance without calling its class's \method{__init__()}.
684 Your job (should you choose to accept it), would be to have
685 \method{load_global()} push onto the unpickler's stack, a known safe
686 version of any class you deem safe to unpickle. It is up to you to
687 produce such a class. Or you could raise an error if you want to
688 disallow all unpickling of instances. If this sounds like a hack,
689 you're right. Refer to the source code to make this work.
691 Things are a little cleaner with \module{cPickle}, but not by much.
692 To control what gets unpickled, you can set the unpickler's
693 \member{find_global} attribute to a function or \code{None}. If it is
694 \code{None} then any attempts to unpickle instances will raise an
695 \exception{UnpicklingError}. If it is a function,
696 then it should accept a module name and a class name, and return the
697 corresponding class object. It is responsible for looking up the
698 class and performing any necessary imports, and it may raise an
699 error to prevent instances of the class from being unpickled.
701 The moral of the story is that you should be really careful about the
702 source of the strings your application unpickles.
704 \subsection{Example \label{pickle-example}}
706 Here's a simple example of how to modify pickling behavior for a
707 class. The \class{TextReader} class opens a text file, and returns
708 the line number and line contents each time its \method{readline()}
709 method is called. If a \class{TextReader} instance is pickled, all
710 attributes \emph{except} the file object member are saved. When the
711 instance is unpickled, the file is reopened, and reading resumes from
712 the last location. The \method{__setstate__()} and
713 \method{__getstate__()} methods are used to implement this behavior.
715 \begin{verbatim}
716 class TextReader:
717 """Print and number lines in a text file."""
718 def __init__(self, file):
719 self.file = file
720 self.fh = open(file)
721 self.lineno = 0
723 def readline(self):
724 self.lineno = self.lineno + 1
725 line = self.fh.readline()
726 if not line:
727 return None
728 if line.endswith("\n"):
729 line = line[:-1]
730 return "%d: %s" % (self.lineno, line)
732 def __getstate__(self):
733 odict = self.__dict__.copy() # copy the dict since we change it
734 del odict['fh'] # remove filehandle entry
735 return odict
737 def __setstate__(self,dict):
738 fh = open(dict['file']) # reopen file
739 count = dict['lineno'] # read from file...
740 while count: # until line count is restored
741 fh.readline()
742 count = count - 1
743 self.__dict__.update(dict) # update attributes
744 self.fh = fh # save the file object
745 \end{verbatim}
747 A sample usage might be something like this:
749 \begin{verbatim}
750 >>> import TextReader
751 >>> obj = TextReader.TextReader("TextReader.py")
752 >>> obj.readline()
753 '1: #!/usr/local/bin/python'
754 >>> # (more invocations of obj.readline() here)
755 ... obj.readline()
756 '7: class TextReader:'
757 >>> import pickle
758 >>> pickle.dump(obj,open('save.p','w'))
759 \end{verbatim}
761 If you want to see that \refmodule{pickle} works across Python
762 processes, start another Python session, before continuing. What
763 follows can happen from either the same process or a new process.
765 \begin{verbatim}
766 >>> import pickle
767 >>> reader = pickle.load(open('save.p'))
768 >>> reader.readline()
769 '8: "Print and number lines in a text file."'
770 \end{verbatim}
773 \begin{seealso}
774 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
775 registration for extension types.}
777 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
779 \seemodule{copy}{Shallow and deep object copying.}
781 \seemodule{marshal}{High-performance serialization of built-in types.}
782 \end{seealso}
785 \section{\module{cPickle} --- A faster \module{pickle}}
787 \declaremodule{builtin}{cPickle}
788 \modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
789 \moduleauthor{Jim Fulton}{jfulton@digicool.com}
790 \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
792 The \module{cPickle} module supports serialization and
793 de-serialization of Python objects, providing an interface and
794 functionality nearly identical to the
795 \refmodule{pickle}\refstmodindex{pickle} module. There are several
796 differences, the most important being performance and subclassability.
798 First, \module{cPickle} can be up to 1000 times faster than
799 \module{pickle} because the former is implemented in C. Second, in
800 the \module{cPickle} module the callables \function{Pickler()} and
801 \function{Unpickler()} are functions, not classes. This means that
802 you cannot use them to derive custom pickling and unpickling
803 subclasses. Most applications have no need for this functionality and
804 should benefit from the greatly improved performance of the
805 \module{cPickle} module.
807 The pickle data stream produced by \module{pickle} and
808 \module{cPickle} are identical, so it is possible to use
809 \module{pickle} and \module{cPickle} interchangeably with existing
810 pickles\footnote{Since the pickle data format is actually a tiny
811 stack-oriented programming language, and some freedom is taken in the
812 encodings of certain objects, it is possible that the two modules
813 produce different data streams for the same input objects. However it
814 is guaranteed that they will always be able to read each other's
815 data streams.}.
817 There are additional minor differences in API between \module{cPickle}
818 and \module{pickle}, however for most applications, they are
819 interchangable. More documentation is provided in the
820 \module{pickle} module documentation, which
821 includes a list of the documented differences.