AddressList.__str__(): Get rid of useless, and broken method. Closes
[python/dscho.git] / Doc / lib / libpickle.tex
blob7789b545546d0c1f5dee21201586a161230793ba
1 \section{\module{pickle} --- Python object serialization}
3 \declaremodule{standard}{pickle}
4 \modulesynopsis{Convert Python objects to streams of bytes and back.}
5 % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
6 % Rewritten by Barry Warsaw <barry@zope.com>
8 \index{persistence}
9 \indexii{persistent}{objects}
10 \indexii{serializing}{objects}
11 \indexii{marshalling}{objects}
12 \indexii{flattening}{objects}
13 \indexii{pickling}{objects}
15 The \module{pickle} module implements a fundamental, but powerful
16 algorithm for serializing and de-serializing a Python object
17 structure. ``Pickling'' is the process whereby a Python object
18 hierarchy is converted into a byte stream, and ``unpickling'' is the
19 inverse operation, whereby a byte stream is converted back into an
20 object hierarchy. Pickling (and unpickling) is alternatively known as
21 ``serialization'', ``marshalling,''\footnote{Don't confuse this with
22 the \refmodule{marshal} module} or ``flattening'',
23 however the preferred term used here is ``pickling'' and
24 ``unpickling'' to avoid confusing.
26 This documentation describes both the \module{pickle} module and the
27 \refmodule{cPickle} module.
29 \subsection{Relationship to other Python modules}
31 The \module{pickle} module has an optimized cousin called the
32 \module{cPickle} module. As its name implies, \module{cPickle} is
33 written in C, so it can be up to 1000 times faster than
34 \module{pickle}. However it does not support subclassing of the
35 \function{Pickler()} and \function{Unpickler()} classes, because in
36 \module{cPickle} these are functions, not classes. Most applications
37 have no need for this functionality, and can benefit from the improved
38 performance of \module{cPickle}. Other than that, the interfaces of
39 the two modules are nearly identical; the common interface is
40 described in this manual and differences are pointed out where
41 necessary. In the following discussions, we use the term ``pickle''
42 to collectively describe the \module{pickle} and
43 \module{cPickle} modules.
45 The data streams the two modules produce are guaranteed to be
46 interchangeable.
48 Python has a more primitive serialization module called
49 \refmodule{marshal}, but in general
50 \module{pickle} should always be the preferred way to serialize Python
51 objects. \module{marshal} exists primarily to support Python's
52 \file{.pyc} files.
54 The \module{pickle} module differs from \refmodule{marshal} several
55 significant ways:
57 \begin{itemize}
59 \item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
63 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
74 \item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
80 \item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
88 \end{itemize}
90 \begin{notice}[warning]
91 The \module{pickle} module is not intended to be secure against
92 erroneous or maliciously constructed data. Never unpickle data
93 received from an untrusted or unauthenticated source.
94 \end{notice}
96 Note that serialization is a more primitive notion than persistence;
97 although
98 \module{pickle} reads and writes file objects, it does not handle the
99 issue of naming persistent objects, nor the (even more complicated)
100 issue of concurrent access to persistent objects. The \module{pickle}
101 module can transform a complex object into a byte stream and it can
102 transform the byte stream into an object with the same internal
103 structure. Perhaps the most obvious thing to do with these byte
104 streams is to write them onto a file, but it is also conceivable to
105 send them across a network or store them in a database. The module
106 \refmodule{shelve} provides a simple interface
107 to pickle and unpickle objects on DBM-style database files.
109 \subsection{Data stream format}
111 The data format used by \module{pickle} is Python-specific. This has
112 the advantage that there are no restrictions imposed by external
113 standards such as XDR\index{XDR}\index{External Data Representation}
114 (which can't represent pointer sharing); however it means that
115 non-Python programs may not be able to reconstruct pickled Python
116 objects.
118 By default, the \module{pickle} data format uses a printable \ASCII{}
119 representation. This is slightly more voluminous than a binary
120 representation. The big advantage of using printable \ASCII{} (and of
121 some other characteristics of \module{pickle}'s representation) is that
122 for debugging or recovery purposes it is possible for a human to read
123 the pickled file with a standard text editor.
125 There are currently 3 different protocols which can be used for pickling.
127 \begin{itemize}
129 \item Protocol version 0 is the original ASCII protocol and is backwards
130 compatible with earlier versions of Python.
132 \item Protocol version 1 is the old binary format which is also compatible
133 with earlier versions of Python.
135 \item Protocol version 2 was introduced in Python 2.3. It provides
136 much more efficient pickling of new-style classes.
138 \end{itemize}
140 Refer to PEP 307 for more information.
142 If a \var{protocol} is not specified, protocol 0 is used.
143 If \var{protocol} is specified as a negative value
144 or \constant{HIGHEST_PROTOCOL},
145 the highest protocol version available will be used.
147 \versionchanged[The \var{bin} parameter is deprecated and only provided
148 for backwards compatibility. You should use the \var{protocol}
149 parameter instead]{2.3}
151 A binary format, which is slightly more efficient, can be chosen by
152 specifying a true value for the \var{bin} argument to the
153 \class{Pickler} constructor or the \function{dump()} and \function{dumps()}
154 functions. A \var{protocol} version >= 1 implies use of a binary format.
156 \subsection{Usage}
158 To serialize an object hierarchy, you first create a pickler, then you
159 call the pickler's \method{dump()} method. To de-serialize a data
160 stream, you first create an unpickler, then you call the unpickler's
161 \method{load()} method. The \module{pickle} module provides the
162 following constant:
164 \begin{datadesc}{HIGHEST_PROTOCOL}
165 The highest protocol version available. This value can be passed
166 as a \var{protocol} value.
167 \end{datadesc}
169 The \module{pickle} module provides the
170 following functions to make this process more convenient:
172 \begin{funcdesc}{dump}{object, file\optional{, protocol\optional{, bin}}}
173 Write a pickled representation of \var{object} to the open file object
174 \var{file}. This is equivalent to
175 \code{Pickler(\var{file}, \var{protocol}, \var{bin}).dump(\var{object})}.
177 If the \var{protocol} parameter is ommitted, protocol 0 is used.
178 If \var{protocol} is specified as a negative value
179 or \constant{HIGHEST_PROTOCOL},
180 the highest protocol version will be used.
182 \versionchanged[The \var{protocol} parameter was added.
183 The \var{bin} parameter is deprecated and only provided
184 for backwards compatibility. You should use the \var{protocol}
185 parameter instead]{2.3}
187 If the optional \var{bin} argument is true, the binary pickle format
188 is used; otherwise the (less efficient) text pickle format is used
189 (for backwards compatibility, this is the default).
191 \var{file} must have a \method{write()} method that accepts a single
192 string argument. It can thus be a file object opened for writing, a
193 \refmodule{StringIO} object, or any other custom
194 object that meets this interface.
195 \end{funcdesc}
197 \begin{funcdesc}{load}{file}
198 Read a string from the open file object \var{file} and interpret it as
199 a pickle data stream, reconstructing and returning the original object
200 hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
202 \var{file} must have two methods, a \method{read()} method that takes
203 an integer argument, and a \method{readline()} method that requires no
204 arguments. Both methods should return a string. Thus \var{file} can
205 be a file object opened for reading, a
206 \module{StringIO} object, or any other custom
207 object that meets this interface.
209 This function automatically determines whether the data stream was
210 written in binary mode or not.
211 \end{funcdesc}
213 \begin{funcdesc}{dumps}{object\optional{, protocol\optional{, bin}}}
214 Return the pickled representation of the object as a string, instead
215 of writing it to a file.
217 If the \var{protocol} parameter is ommitted, protocol 0 is used.
218 If \var{protocol} is specified as a negative value
219 or \constant{HIGHEST_PROTOCOL},
220 the highest protocol version will be used.
222 \versionchanged[The \var{protocol} parameter was added.
223 The \var{bin} parameter is deprecated and only provided
224 for backwards compatibility. You should use the \var{protocol}
225 parameter instead]{2.3}
227 If the optional \var{bin} argument is
228 true, the binary pickle format is used; otherwise the (less efficient)
229 text pickle format is used (this is the default).
230 \end{funcdesc}
232 \begin{funcdesc}{loads}{string}
233 Read a pickled object hierarchy from a string. Characters in the
234 string past the pickled object's representation are ignored.
235 \end{funcdesc}
237 The \module{pickle} module also defines three exceptions:
239 \begin{excdesc}{PickleError}
240 A common base class for the other exceptions defined below. This
241 inherits from \exception{Exception}.
242 \end{excdesc}
244 \begin{excdesc}{PicklingError}
245 This exception is raised when an unpicklable object is passed to
246 the \method{dump()} method.
247 \end{excdesc}
249 \begin{excdesc}{UnpicklingError}
250 This exception is raised when there is a problem unpickling an object.
251 Note that other exceptions may also be raised during unpickling,
252 including (but not necessarily limited to) \exception{AttributeError},
253 \exception{EOFError}, \exception{ImportError}, and \exception{IndexError}.
254 \end{excdesc}
256 The \module{pickle} module also exports two callables\footnote{In the
257 \module{pickle} module these callables are classes, which you could
258 subclass to customize the behavior. However, in the \module{cPickle}
259 modules these callables are factory functions and so cannot be
260 subclassed. One of the common reasons to subclass is to control what
261 objects can actually be unpickled. See section~\ref{pickle-sub} for
262 more details.}, \class{Pickler} and
263 \class{Unpickler}:
265 \begin{classdesc}{Pickler}{file\optional{, protocol\optional{, bin}}}
266 This takes a file-like object to which it will write a pickle data
267 stream.
269 If the \var{protocol} parameter is ommitted, protocol 0 is used.
270 If \var{protocol} is specified as a negative value,
271 the highest protocol version will be used.
273 \versionchanged[The \var{bin} parameter is deprecated and only provided
274 for backwards compatibility. You should use the \var{protocol}
275 parameter instead]{2.3}
277 Optional \var{bin} if true, tells the pickler to use the more
278 efficient binary pickle format, otherwise the \ASCII{} format is used
279 (this is the default).
281 \var{file} must have a \method{write()} method that accepts a single
282 string argument. It can thus be an open file object, a
283 \module{StringIO} object, or any other custom
284 object that meets this interface.
285 \end{classdesc}
287 \class{Pickler} objects define one (or two) public methods:
289 \begin{methoddesc}[Pickler]{dump}{object}
290 Write a pickled representation of \var{object} to the open file object
291 given in the constructor. Either the binary or \ASCII{} format will
292 be used, depending on the value of the \var{bin} flag passed to the
293 constructor.
294 \end{methoddesc}
296 \begin{methoddesc}[Pickler]{clear_memo}{}
297 Clears the pickler's ``memo''. The memo is the data structure that
298 remembers which objects the pickler has already seen, so that shared
299 or recursive objects pickled by reference and not by value. This
300 method is useful when re-using picklers.
302 \begin{notice}
303 Prior to Python 2.3, \method{clear_memo()} was only available on the
304 picklers created by \refmodule{cPickle}. In the \module{pickle} module,
305 picklers have an instance variable called \member{memo} which is a
306 Python dictionary. So to clear the memo for a \module{pickle} module
307 pickler, you could do the following:
309 \begin{verbatim}
310 mypickler.memo.clear()
311 \end{verbatim}
313 Code that does not need to support older versions of Python should
314 simply use \method{clear_memo()}.
315 \end{notice}
316 \end{methoddesc}
318 It is possible to make multiple calls to the \method{dump()} method of
319 the same \class{Pickler} instance. These must then be matched to the
320 same number of calls to the \method{load()} method of the
321 corresponding \class{Unpickler} instance. If the same object is
322 pickled by multiple \method{dump()} calls, the \method{load()} will
323 all yield references to the same object\footnote{\emph{Warning}: this
324 is intended for pickling multiple objects without intervening
325 modifications to the objects or their parts. If you modify an object
326 and then pickle it again using the same \class{Pickler} instance, the
327 object is not pickled again --- a reference to it is pickled and the
328 \class{Unpickler} will return the old value, not the modified one.
329 There are two problems here: (1) detecting changes, and (2)
330 marshalling a minimal set of changes. Garbage Collection may also
331 become a problem here.}.
333 \class{Unpickler} objects are defined as:
335 \begin{classdesc}{Unpickler}{file}
336 This takes a file-like object from which it will read a pickle data
337 stream. This class automatically determines whether the data stream
338 was written in binary mode or not, so it does not need a flag as in
339 the \class{Pickler} factory.
341 \var{file} must have two methods, a \method{read()} method that takes
342 an integer argument, and a \method{readline()} method that requires no
343 arguments. Both methods should return a string. Thus \var{file} can
344 be a file object opened for reading, a
345 \module{StringIO} object, or any other custom
346 object that meets this interface.
347 \end{classdesc}
349 \class{Unpickler} objects have one (or two) public methods:
351 \begin{methoddesc}[Unpickler]{load}{}
352 Read a pickled object representation from the open file object given
353 in the constructor, and return the reconstituted object hierarchy
354 specified therein.
355 \end{methoddesc}
357 \begin{methoddesc}[Unpickler]{noload}{}
358 This is just like \method{load()} except that it doesn't actually
359 create any objects. This is useful primarily for finding what's
360 called ``persistent ids'' that may be referenced in a pickle data
361 stream. See section~\ref{pickle-protocol} below for more details.
363 \strong{Note:} the \method{noload()} method is currently only
364 available on \class{Unpickler} objects created with the
365 \module{cPickle} module. \module{pickle} module \class{Unpickler}s do
366 not have the \method{noload()} method.
367 \end{methoddesc}
369 \subsection{What can be pickled and unpickled?}
371 The following types can be pickled:
373 \begin{itemize}
375 \item \code{None}, \code{True}, and \code{False}
377 \item integers, long integers, floating point numbers, complex numbers
379 \item normal and Unicode strings
381 \item tuples, lists, and dictionaries containing only picklable objects
383 \item functions defined at the top level of a module
385 \item built-in functions defined at the top level of a module
387 \item classes that are defined at the top level of a module
389 \item instances of such classes whose \member{__dict__} or
390 \method{__setstate__()} is picklable (see
391 section~\ref{pickle-protocol} for details)
393 \end{itemize}
395 Attempts to pickle unpicklable objects will raise the
396 \exception{PicklingError} exception; when this happens, an unspecified
397 number of bytes may have already been written to the underlying file.
399 Note that functions (built-in and user-defined) are pickled by ``fully
400 qualified'' name reference, not by value. This means that only the
401 function name is pickled, along with the name of module the function
402 is defined in. Neither the function's code, nor any of its function
403 attributes are pickled. Thus the defining module must be importable
404 in the unpickling environment, and the module must contain the named
405 object, otherwise an exception will be raised\footnote{The exception
406 raised will likely be an \exception{ImportError} or an
407 \exception{AttributeError} but it could be something else.}.
409 Similarly, classes are pickled by named reference, so the same
410 restrictions in the unpickling environment apply. Note that none of
411 the class's code or data is pickled, so in the following example the
412 class attribute \code{attr} is not restored in the unpickling
413 environment:
415 \begin{verbatim}
416 class Foo:
417 attr = 'a class attr'
419 picklestring = pickle.dumps(Foo)
420 \end{verbatim}
422 These restrictions are why picklable functions and classes must be
423 defined in the top level of a module.
425 Similarly, when class instances are pickled, their class's code and
426 data are not pickled along with them. Only the instance data are
427 pickled. This is done on purpose, so you can fix bugs in a class or
428 add methods to the class and still load objects that were created with
429 an earlier version of the class. If you plan to have long-lived
430 objects that will see many versions of a class, it may be worthwhile
431 to put a version number in the objects so that suitable conversions
432 can be made by the class's \method{__setstate__()} method.
434 \subsection{The pickle protocol
435 \label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
437 This section describes the ``pickling protocol'' that defines the
438 interface between the pickler/unpickler and the objects that are being
439 serialized. This protocol provides a standard way for you to define,
440 customize, and control how your objects are serialized and
441 de-serialized. The description in this section doesn't cover specific
442 customizations that you can employ to make the unpickling environment
443 slightly safer from untrusted pickle data streams; see section~\ref{pickle-sub}
444 for more details.
446 \subsubsection{Pickling and unpickling normal class
447 instances\label{pickle-inst}}
449 When a pickled class instance is unpickled, its \method{__init__()}
450 method is normally \emph{not} invoked. If it is desirable that the
451 \method{__init__()} method be called on unpickling, a class can define
452 a method \method{__getinitargs__()}, which should return a
453 \emph{tuple} containing the arguments to be passed to the class
454 constructor (i.e. \method{__init__()}). The
455 \method{__getinitargs__()} method is called at
456 pickle time; the tuple it returns is incorporated in the pickle for
457 the instance.
458 \withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
459 \withsubitem{(instance constructor)}{\ttindex{__init__()}}
461 \withsubitem{(copy protocol)}{
462 \ttindex{__getstate__()}\ttindex{__setstate__()}}
463 \withsubitem{(instance attribute)}{
464 \ttindex{__dict__}}
466 Classes can further influence how their instances are pickled; if the
467 class defines the method \method{__getstate__()}, it is called and the
468 return state is pickled as the contents for the instance, instead of
469 the contents of the instance's dictionary. If there is no
470 \method{__getstate__()} method, the instance's \member{__dict__} is
471 pickled.
473 Upon unpickling, if the class also defines the method
474 \method{__setstate__()}, it is called with the unpickled
475 state\footnote{These methods can also be used to implement copying
476 class instances.}. If there is no \method{__setstate__()} method, the
477 pickled state must be a dictionary and its items are assigned to the
478 new instance's dictionary. If a class defines both
479 \method{__getstate__()} and \method{__setstate__()}, the state object
480 needn't be a dictionary and these methods can do what they
481 want.\footnote{This protocol is also used by the shallow and deep
482 copying operations defined in the
483 \refmodule{copy} module.}
485 \begin{notice}[warning]
486 For new-style classes, if \method{__getstate__()} returns a false
487 value, the \method{__setstate__()} method will not be called.
488 \end{notice}
491 \subsubsection{Pickling and unpickling extension types}
493 When the \class{Pickler} encounters an object of a type it knows
494 nothing about --- such as an extension type --- it looks in two places
495 for a hint of how to pickle it. One alternative is for the object to
496 implement a \method{__reduce__()} method. If provided, at pickling
497 time \method{__reduce__()} will be called with no arguments, and it
498 must return either a string or a tuple.
500 If a string is returned, it names a global variable whose contents are
501 pickled as normal. When a tuple is returned, it must be of length two
502 or three, with the following semantics:
504 \begin{itemize}
506 \item A callable object, which in the unpickling environment must be
507 either a class, a callable registered as a ``safe constructor''
508 (see below), or it must have an attribute
509 \member{__safe_for_unpickling__} with a true value. Otherwise,
510 an \exception{UnpicklingError} will be raised in the unpickling
511 environment. Note that as usual, the callable itself is pickled
512 by name.
514 \item A tuple of arguments for the callable object, or \code{None}.
515 \deprecated{2.3}{Use the tuple of arguments instead}
517 \item Optionally, the object's state, which will be passed to
518 the object's \method{__setstate__()} method as described in
519 section~\ref{pickle-inst}. If the object has no
520 \method{__setstate__()} method, then, as above, the value must
521 be a dictionary and it will be added to the object's
522 \member{__dict__}.
524 \end{itemize}
526 Upon unpickling, the callable will be called (provided that it meets
527 the above criteria), passing in the tuple of arguments; it should
528 return the unpickled object.
530 If the second item was \code{None}, then instead of calling the
531 callable directly, its \method{__basicnew__()} method is called
532 without arguments. It should also return the unpickled object.
534 \deprecated{2.3}{Use the tuple of arguments instead}
536 An alternative to implementing a \method{__reduce__()} method on the
537 object to be pickled, is to register the callable with the
538 \refmodule[copyreg]{copy_reg} module. This module provides a way
539 for programs to register ``reduction functions'' and constructors for
540 user-defined types. Reduction functions have the same semantics and
541 interface as the \method{__reduce__()} method described above, except
542 that they are called with a single argument, the object to be pickled.
544 The registered constructor is deemed a ``safe constructor'' for purposes
545 of unpickling as described above.
547 \subsubsection{Pickling and unpickling external objects}
549 For the benefit of object persistence, the \module{pickle} module
550 supports the notion of a reference to an object outside the pickled
551 data stream. Such objects are referenced by a ``persistent id'',
552 which is just an arbitrary string of printable \ASCII{} characters.
553 The resolution of such names is not defined by the \module{pickle}
554 module; it will delegate this resolution to user defined functions on
555 the pickler and unpickler\footnote{The actual mechanism for
556 associating these user defined functions is slightly different for
557 \module{pickle} and \module{cPickle}. The description given here
558 works the same for both implementations. Users of the \module{pickle}
559 module could also use subclassing to effect the same results,
560 overriding the \method{persistent_id()} and \method{persistent_load()}
561 methods in the derived classes.}.
563 To define external persistent id resolution, you need to set the
564 \member{persistent_id} attribute of the pickler object and the
565 \member{persistent_load} attribute of the unpickler object.
567 To pickle objects that have an external persistent id, the pickler
568 must have a custom \function{persistent_id()} method that takes an
569 object as an argument and returns either \code{None} or the persistent
570 id for that object. When \code{None} is returned, the pickler simply
571 pickles the object as normal. When a persistent id string is
572 returned, the pickler will pickle that string, along with a marker
573 so that the unpickler will recognize the string as a persistent id.
575 To unpickle external objects, the unpickler must have a custom
576 \function{persistent_load()} function that takes a persistent id
577 string and returns the referenced object.
579 Here's a silly example that \emph{might} shed more light:
581 \begin{verbatim}
582 import pickle
583 from cStringIO import StringIO
585 src = StringIO()
586 p = pickle.Pickler(src)
588 def persistent_id(obj):
589 if hasattr(obj, 'x'):
590 return 'the value %d' % obj.x
591 else:
592 return None
594 p.persistent_id = persistent_id
596 class Integer:
597 def __init__(self, x):
598 self.x = x
599 def __str__(self):
600 return 'My name is integer %d' % self.x
602 i = Integer(7)
603 print i
604 p.dump(i)
606 datastream = src.getvalue()
607 print repr(datastream)
608 dst = StringIO(datastream)
610 up = pickle.Unpickler(dst)
612 class FancyInteger(Integer):
613 def __str__(self):
614 return 'I am the integer %d' % self.x
616 def persistent_load(persid):
617 if persid.startswith('the value '):
618 value = int(persid.split()[2])
619 return FancyInteger(value)
620 else:
621 raise pickle.UnpicklingError, 'Invalid persistent id'
623 up.persistent_load = persistent_load
625 j = up.load()
626 print j
627 \end{verbatim}
629 In the \module{cPickle} module, the unpickler's
630 \member{persistent_load} attribute can also be set to a Python
631 list, in which case, when the unpickler reaches a persistent id, the
632 persistent id string will simply be appended to this list. This
633 functionality exists so that a pickle data stream can be ``sniffed''
634 for object references without actually instantiating all the objects
635 in a pickle\footnote{We'll leave you with the image of Guido and Jim
636 sitting around sniffing pickles in their living rooms.}. Setting
637 \member{persistent_load} to a list is usually used in conjunction with
638 the \method{noload()} method on the Unpickler.
640 % BAW: Both pickle and cPickle support something called
641 % inst_persistent_id() which appears to give unknown types a second
642 % shot at producing a persistent id. Since Jim Fulton can't remember
643 % why it was added or what it's for, I'm leaving it undocumented.
645 \subsection{Subclassing Unpicklers \label{pickle-sub}}
647 By default, unpickling will import any class that it finds in the
648 pickle data. You can control exactly what gets unpickled and what
649 gets called by customizing your unpickler. Unfortunately, exactly how
650 you do this is different depending on whether you're using
651 \module{pickle} or \module{cPickle}.\footnote{A word of caution: the
652 mechanisms described here use internal attributes and methods, which
653 are subject to change in future versions of Python. We intend to
654 someday provide a common interface for controlling this behavior,
655 which will work in either \module{pickle} or \module{cPickle}.}.
657 In the \module{pickle} module, you need to derive a subclass from
658 \class{Unpickler}, overriding the \method{load_global()}
659 method. \method{load_global()} should read two lines from the pickle
660 data stream where the first line will the the name of the module
661 containing the class and the second line will be the name of the
662 instance's class. It then looks up the class, possibly importing the
663 module and digging out the attribute, then it appends what it finds to
664 the unpickler's stack. Later on, this class will be assigned to the
665 \member{__class__} attribute of an empty class, as a way of magically
666 creating an instance without calling its class's \method{__init__()}.
667 Your job (should you choose to accept it), would be to have
668 \method{load_global()} push onto the unpickler's stack, a known safe
669 version of any class you deem safe to unpickle. It is up to you to
670 produce such a class. Or you could raise an error if you want to
671 disallow all unpickling of instances. If this sounds like a hack,
672 you're right. Refer to the source code to make this work.
674 Things are a little cleaner with \module{cPickle}, but not by much.
675 To control what gets unpickled, you can set the unpickler's
676 \member{find_global} attribute to a function or \code{None}. If it is
677 \code{None} then any attempts to unpickle instances will raise an
678 \exception{UnpicklingError}. If it is a function,
679 then it should accept a module name and a class name, and return the
680 corresponding class object. It is responsible for looking up the
681 class and performing any necessary imports, and it may raise an
682 error to prevent instances of the class from being unpickled.
684 The moral of the story is that you should be really careful about the
685 source of the strings your application unpickles.
687 \subsection{Example \label{pickle-example}}
689 Here's a simple example of how to modify pickling behavior for a
690 class. The \class{TextReader} class opens a text file, and returns
691 the line number and line contents each time its \method{readline()}
692 method is called. If a \class{TextReader} instance is pickled, all
693 attributes \emph{except} the file object member are saved. When the
694 instance is unpickled, the file is reopened, and reading resumes from
695 the last location. The \method{__setstate__()} and
696 \method{__getstate__()} methods are used to implement this behavior.
698 \begin{verbatim}
699 class TextReader:
700 """Print and number lines in a text file."""
701 def __init__(self, file):
702 self.file = file
703 self.fh = open(file)
704 self.lineno = 0
706 def readline(self):
707 self.lineno = self.lineno + 1
708 line = self.fh.readline()
709 if not line:
710 return None
711 if line.endswith("\n"):
712 line = line[:-1]
713 return "%d: %s" % (self.lineno, line)
715 def __getstate__(self):
716 odict = self.__dict__.copy() # copy the dict since we change it
717 del odict['fh'] # remove filehandle entry
718 return odict
720 def __setstate__(self,dict):
721 fh = open(dict['file']) # reopen file
722 count = dict['lineno'] # read from file...
723 while count: # until line count is restored
724 fh.readline()
725 count = count - 1
726 self.__dict__.update(dict) # update attributes
727 self.fh = fh # save the file object
728 \end{verbatim}
730 A sample usage might be something like this:
732 \begin{verbatim}
733 >>> import TextReader
734 >>> obj = TextReader.TextReader("TextReader.py")
735 >>> obj.readline()
736 '1: #!/usr/local/bin/python'
737 >>> # (more invocations of obj.readline() here)
738 ... obj.readline()
739 '7: class TextReader:'
740 >>> import pickle
741 >>> pickle.dump(obj,open('save.p','w'))
742 \end{verbatim}
744 If you want to see that \refmodule{pickle} works across Python
745 processes, start another Python session, before continuing. What
746 follows can happen from either the same process or a new process.
748 \begin{verbatim}
749 >>> import pickle
750 >>> reader = pickle.load(open('save.p'))
751 >>> reader.readline()
752 '8: "Print and number lines in a text file."'
753 \end{verbatim}
756 \begin{seealso}
757 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
758 registration for extension types.}
760 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
762 \seemodule{copy}{Shallow and deep object copying.}
764 \seemodule{marshal}{High-performance serialization of built-in types.}
765 \end{seealso}
768 \section{\module{cPickle} --- A faster \module{pickle}}
770 \declaremodule{builtin}{cPickle}
771 \modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
772 \moduleauthor{Jim Fulton}{jfulton@digicool.com}
773 \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
775 The \module{cPickle} module supports serialization and
776 de-serialization of Python objects, providing an interface and
777 functionality nearly identical to the
778 \refmodule{pickle}\refstmodindex{pickle} module. There are several
779 differences, the most important being performance and subclassability.
781 First, \module{cPickle} can be up to 1000 times faster than
782 \module{pickle} because the former is implemented in C. Second, in
783 the \module{cPickle} module the callables \function{Pickler()} and
784 \function{Unpickler()} are functions, not classes. This means that
785 you cannot use them to derive custom pickling and unpickling
786 subclasses. Most applications have no need for this functionality and
787 should benefit from the greatly improved performance of the
788 \module{cPickle} module.
790 The pickle data stream produced by \module{pickle} and
791 \module{cPickle} are identical, so it is possible to use
792 \module{pickle} and \module{cPickle} interchangeably with existing
793 pickles\footnote{Since the pickle data format is actually a tiny
794 stack-oriented programming language, and some freedom is taken in the
795 encodings of certain objects, it is possible that the two modules
796 produce different data streams for the same input objects. However it
797 is guaranteed that they will always be able to read each other's
798 data streams.}.
800 There are additional minor differences in API between \module{cPickle}
801 and \module{pickle}, however for most applications, they are
802 interchangable. More documentation is provided in the
803 \module{pickle} module documentation, which
804 includes a list of the documented differences.