Use full package paths in imports.
[python/dscho.git] / Doc / lib / libpickle.tex
blob194a717fc0a9f7e4ad2b973721f4288371d72859
1 \section{\module{pickle} --- Python object serialization}
3 \declaremodule{standard}{pickle}
4 \modulesynopsis{Convert Python objects to streams of bytes and back.}
5 % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
6 % Rewritten by Barry Warsaw <barry@zope.com>
8 \index{persistence}
9 \indexii{persistent}{objects}
10 \indexii{serializing}{objects}
11 \indexii{marshalling}{objects}
12 \indexii{flattening}{objects}
13 \indexii{pickling}{objects}
15 The \module{pickle} module implements a fundamental, but powerful
16 algorithm for serializing and de-serializing a Python object
17 structure. ``Pickling'' is the process whereby a Python object
18 hierarchy is converted into a byte stream, and ``unpickling'' is the
19 inverse operation, whereby a byte stream is converted back into an
20 object hierarchy. Pickling (and unpickling) is alternatively known as
21 ``serialization'', ``marshalling,''\footnote{Don't confuse this with
22 the \refmodule{marshal} module} or ``flattening'',
23 however the preferred term used here is ``pickling'' and
24 ``unpickling'' to avoid confusing.
26 This documentation describes both the \module{pickle} module and the
27 \refmodule{cPickle} module.
29 \subsection{Relationship to other Python modules}
31 The \module{pickle} module has an optimized cousin called the
32 \module{cPickle} module. As its name implies, \module{cPickle} is
33 written in C, so it can be up to 1000 times faster than
34 \module{pickle}. However it does not support subclassing of the
35 \function{Pickler()} and \function{Unpickler()} classes, because in
36 \module{cPickle} these are functions, not classes. Most applications
37 have no need for this functionality, and can benefit from the improved
38 performance of \module{cPickle}. Other than that, the interfaces of
39 the two modules are nearly identical; the common interface is
40 described in this manual and differences are pointed out where
41 necessary. In the following discussions, we use the term ``pickle''
42 to collectively describe the \module{pickle} and
43 \module{cPickle} modules.
45 The data streams the two modules produce are guaranteed to be
46 interchangeable.
48 Python has a more primitive serialization module called
49 \refmodule{marshal}, but in general
50 \module{pickle} should always be the preferred way to serialize Python
51 objects. \module{marshal} exists primarily to support Python's
52 \file{.pyc} files.
54 The \module{pickle} module differs from \refmodule{marshal} several
55 significant ways:
57 \begin{itemize}
59 \item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
63 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
74 \item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
80 \item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
88 \item The \module{pickle} module doesn't handle code objects, which
89 the \module{marshal} module does. This avoids the possibility
90 of smuggling Trojan horses into a program through the
91 \module{pickle} module\footnote{This doesn't necessarily imply
92 that \module{pickle} is inherently secure. See
93 section~\ref{pickle-sec} for a more detailed discussion on
94 \module{pickle} module security. Besides, it's possible that
95 \module{pickle} will eventually support serializing code
96 objects.}.
98 \end{itemize}
100 Note that serialization is a more primitive notion than persistence;
101 although
102 \module{pickle} reads and writes file objects, it does not handle the
103 issue of naming persistent objects, nor the (even more complicated)
104 issue of concurrent access to persistent objects. The \module{pickle}
105 module can transform a complex object into a byte stream and it can
106 transform the byte stream into an object with the same internal
107 structure. Perhaps the most obvious thing to do with these byte
108 streams is to write them onto a file, but it is also conceivable to
109 send them across a network or store them in a database. The module
110 \refmodule{shelve} provides a simple interface
111 to pickle and unpickle objects on DBM-style database files.
113 \subsection{Data stream format}
115 The data format used by \module{pickle} is Python-specific. This has
116 the advantage that there are no restrictions imposed by external
117 standards such as XDR\index{XDR}\index{External Data Representation}
118 (which can't represent pointer sharing); however it means that
119 non-Python programs may not be able to reconstruct pickled Python
120 objects.
122 By default, the \module{pickle} data format uses a printable \ASCII{}
123 representation. This is slightly more voluminous than a binary
124 representation. The big advantage of using printable \ASCII{} (and of
125 some other characteristics of \module{pickle}'s representation) is that
126 for debugging or recovery purposes it is possible for a human to read
127 the pickled file with a standard text editor.
129 A binary format, which is slightly more efficient, can be chosen by
130 specifying a true value for the \var{bin} argument to the
131 \class{Pickler} constructor or the \function{dump()} and \function{dumps()}
132 functions.
134 \subsection{Usage}
136 To serialize an object hierarchy, you first create a pickler, then you
137 call the pickler's \method{dump()} method. To de-serialize a data
138 stream, you first create an unpickler, then you call the unpickler's
139 \method{load()} method. The \module{pickle} module provides the
140 following functions to make this process more convenient:
142 \begin{funcdesc}{dump}{object, file\optional{, bin}}
143 Write a pickled representation of \var{object} to the open file object
144 \var{file}. This is equivalent to
145 \code{Pickler(\var{file}, \var{bin}).dump(\var{object})}.
146 If the optional \var{bin} argument is true, the binary pickle format
147 is used; otherwise the (less efficient) text pickle format is used
148 (for backwards compatibility, this is the default).
150 \var{file} must have a \method{write()} method that accepts a single
151 string argument. It can thus be a file object opened for writing, a
152 \refmodule{StringIO} object, or any other custom
153 object that meets this interface.
154 \end{funcdesc}
156 \begin{funcdesc}{load}{file}
157 Read a string from the open file object \var{file} and interpret it as
158 a pickle data stream, reconstructing and returning the original object
159 hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
161 \var{file} must have two methods, a \method{read()} method that takes
162 an integer argument, and a \method{readline()} method that requires no
163 arguments. Both methods should return a string. Thus \var{file} can
164 be a file object opened for reading, a
165 \module{StringIO} object, or any other custom
166 object that meets this interface.
168 This function automatically determines whether the data stream was
169 written in binary mode or not.
170 \end{funcdesc}
172 \begin{funcdesc}{dumps}{object\optional{, bin}}
173 Return the pickled representation of the object as a string, instead
174 of writing it to a file. If the optional \var{bin} argument is
175 true, the binary pickle format is used; otherwise the (less efficient)
176 text pickle format is used (this is the default).
177 \end{funcdesc}
179 \begin{funcdesc}{loads}{string}
180 Read a pickled object hierarchy from a string. Characters in the
181 string past the pickled object's representation are ignored.
182 \end{funcdesc}
184 The \module{pickle} module also defines three exceptions:
186 \begin{excdesc}{PickleError}
187 A common base class for the other exceptions defined below. This
188 inherits from \exception{Exception}.
189 \end{excdesc}
191 \begin{excdesc}{PicklingError}
192 This exception is raised when an unpicklable object is passed to
193 the \method{dump()} method.
194 \end{excdesc}
196 \begin{excdesc}{UnpicklingError}
197 This exception is raised when there is a problem unpickling an object,
198 such as a security violation. Note that other exceptions may also be
199 raised during unpickling, including (but not necessarily limited to)
200 \exception{AttributeError}, \exception{EOFError},
201 \exception{ImportError}, and \exception{IndexError}.
202 \end{excdesc}
204 The \module{pickle} module also exports two callables\footnote{In the
205 \module{pickle} module these callables are classes, which you could
206 subclass to customize the behavior. However, in the \module{cPickle}
207 modules these callables are factory functions and so cannot be
208 subclassed. One of the common reasons to subclass is to control what
209 objects can actually be unpickled. See section~\ref{pickle-sec} for
210 more details on security concerns.}, \class{Pickler} and
211 \class{Unpickler}:
213 \begin{classdesc}{Pickler}{file\optional{, bin}}
214 This takes a file-like object to which it will write a pickle data
215 stream. Optional \var{bin} if true, tells the pickler to use the more
216 efficient binary pickle format, otherwise the \ASCII{} format is used
217 (this is the default).
219 \var{file} must have a \method{write()} method that accepts a single
220 string argument. It can thus be an open file object, a
221 \module{StringIO} object, or any other custom
222 object that meets this interface.
223 \end{classdesc}
225 \class{Pickler} objects define one (or two) public methods:
227 \begin{methoddesc}[Pickler]{dump}{object}
228 Write a pickled representation of \var{object} to the open file object
229 given in the constructor. Either the binary or \ASCII{} format will
230 be used, depending on the value of the \var{bin} flag passed to the
231 constructor.
232 \end{methoddesc}
234 \begin{methoddesc}[Pickler]{clear_memo}{}
235 Clears the pickler's ``memo''. The memo is the data structure that
236 remembers which objects the pickler has already seen, so that shared
237 or recursive objects pickled by reference and not by value. This
238 method is useful when re-using picklers.
240 \begin{notice}
241 Prior to Python 2.3, \method{clear_memo()} was only available on the
242 picklers created by \refmodule{cPickle}. In the \module{pickle} module,
243 picklers have an instance variable called \member{memo} which is a
244 Python dictionary. So to clear the memo for a \module{pickle} module
245 pickler, you could do the following:
247 \begin{verbatim}
248 mypickler.memo.clear()
249 \end{verbatim}
251 Code that does not need to support older versions of Python should
252 simply use \method{clear_memo()}.
253 \end{notice}
254 \end{methoddesc}
256 It is possible to make multiple calls to the \method{dump()} method of
257 the same \class{Pickler} instance. These must then be matched to the
258 same number of calls to the \method{load()} method of the
259 corresponding \class{Unpickler} instance. If the same object is
260 pickled by multiple \method{dump()} calls, the \method{load()} will
261 all yield references to the same object\footnote{\emph{Warning}: this
262 is intended for pickling multiple objects without intervening
263 modifications to the objects or their parts. If you modify an object
264 and then pickle it again using the same \class{Pickler} instance, the
265 object is not pickled again --- a reference to it is pickled and the
266 \class{Unpickler} will return the old value, not the modified one.
267 There are two problems here: (1) detecting changes, and (2)
268 marshalling a minimal set of changes. Garbage Collection may also
269 become a problem here.}.
271 \class{Unpickler} objects are defined as:
273 \begin{classdesc}{Unpickler}{file}
274 This takes a file-like object from which it will read a pickle data
275 stream. This class automatically determines whether the data stream
276 was written in binary mode or not, so it does not need a flag as in
277 the \class{Pickler} factory.
279 \var{file} must have two methods, a \method{read()} method that takes
280 an integer argument, and a \method{readline()} method that requires no
281 arguments. Both methods should return a string. Thus \var{file} can
282 be a file object opened for reading, a
283 \module{StringIO} object, or any other custom
284 object that meets this interface.
285 \end{classdesc}
287 \class{Unpickler} objects have one (or two) public methods:
289 \begin{methoddesc}[Unpickler]{load}{}
290 Read a pickled object representation from the open file object given
291 in the constructor, and return the reconstituted object hierarchy
292 specified therein.
293 \end{methoddesc}
295 \begin{methoddesc}[Unpickler]{noload}{}
296 This is just like \method{load()} except that it doesn't actually
297 create any objects. This is useful primarily for finding what's
298 called ``persistent ids'' that may be referenced in a pickle data
299 stream. See section~\ref{pickle-protocol} below for more details.
301 \strong{Note:} the \method{noload()} method is currently only
302 available on \class{Unpickler} objects created with the
303 \module{cPickle} module. \module{pickle} module \class{Unpickler}s do
304 not have the \method{noload()} method.
305 \end{methoddesc}
307 \subsection{What can be pickled and unpickled?}
309 The following types can be pickled:
311 \begin{itemize}
313 \item \code{None}
315 \item integers, long integers, floating point numbers, complex numbers
317 \item normal and Unicode strings
319 \item tuples, lists, and dictionaries containing only picklable objects
321 \item functions defined at the top level of a module
323 \item built-in functions defined at the top level of a module
325 \item classes that are defined at the top level of a module
327 \item instances of such classes whose \member{__dict__} or
328 \method{__setstate__()} is picklable (see
329 section~\ref{pickle-protocol} for details)
331 \end{itemize}
333 Attempts to pickle unpicklable objects will raise the
334 \exception{PicklingError} exception; when this happens, an unspecified
335 number of bytes may have already been written to the underlying file.
337 Note that functions (built-in and user-defined) are pickled by ``fully
338 qualified'' name reference, not by value. This means that only the
339 function name is pickled, along with the name of module the function
340 is defined in. Neither the function's code, nor any of its function
341 attributes are pickled. Thus the defining module must be importable
342 in the unpickling environment, and the module must contain the named
343 object, otherwise an exception will be raised\footnote{The exception
344 raised will likely be an \exception{ImportError} or an
345 \exception{AttributeError} but it could be something else.}.
347 Similarly, classes are pickled by named reference, so the same
348 restrictions in the unpickling environment apply. Note that none of
349 the class's code or data is pickled, so in the following example the
350 class attribute \code{attr} is not restored in the unpickling
351 environment:
353 \begin{verbatim}
354 class Foo:
355 attr = 'a class attr'
357 picklestring = pickle.dumps(Foo)
358 \end{verbatim}
360 These restrictions are why picklable functions and classes must be
361 defined in the top level of a module.
363 Similarly, when class instances are pickled, their class's code and
364 data are not pickled along with them. Only the instance data are
365 pickled. This is done on purpose, so you can fix bugs in a class or
366 add methods to the class and still load objects that were created with
367 an earlier version of the class. If you plan to have long-lived
368 objects that will see many versions of a class, it may be worthwhile
369 to put a version number in the objects so that suitable conversions
370 can be made by the class's \method{__setstate__()} method.
372 \subsection{The pickle protocol
373 \label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
375 This section describes the ``pickling protocol'' that defines the
376 interface between the pickler/unpickler and the objects that are being
377 serialized. This protocol provides a standard way for you to define,
378 customize, and control how your objects are serialized and
379 de-serialized. The description in this section doesn't cover specific
380 customizations that you can employ to make the unpickling environment
381 safer from untrusted pickle data streams; see section~\ref{pickle-sec}
382 for more details.
384 \subsubsection{Pickling and unpickling normal class
385 instances\label{pickle-inst}}
387 When a pickled class instance is unpickled, its \method{__init__()}
388 method is normally \emph{not} invoked. If it is desirable that the
389 \method{__init__()} method be called on unpickling, a class can define
390 a method \method{__getinitargs__()}, which should return a
391 \emph{tuple} containing the arguments to be passed to the class
392 constructor (i.e. \method{__init__()}). The
393 \method{__getinitargs__()} method is called at
394 pickle time; the tuple it returns is incorporated in the pickle for
395 the instance.
396 \withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
397 \withsubitem{(instance constructor)}{\ttindex{__init__()}}
399 \withsubitem{(copy protocol)}{
400 \ttindex{__getstate__()}\ttindex{__setstate__()}}
401 \withsubitem{(instance attribute)}{
402 \ttindex{__dict__}}
404 Classes can further influence how their instances are pickled; if the
405 class defines the method \method{__getstate__()}, it is called and the
406 return state is pickled as the contents for the instance, instead of
407 the contents of the instance's dictionary. If there is no
408 \method{__getstate__()} method, the instance's \member{__dict__} is
409 pickled.
411 Upon unpickling, if the class also defines the method
412 \method{__setstate__()}, it is called with the unpickled
413 state\footnote{These methods can also be used to implement copying
414 class instances.}. If there is no \method{__setstate__()} method, the
415 pickled object must be a dictionary and its items are assigned to the
416 new instance's dictionary. If a class defines both
417 \method{__getstate__()} and \method{__setstate__()}, the state object
418 needn't be a dictionary and these methods can do what they
419 want\footnote{This protocol is also used by the shallow and deep
420 copying operations defined in the
421 \refmodule{copy} module.}.
423 \subsubsection{Pickling and unpickling extension types}
425 When the \class{Pickler} encounters an object of a type it knows
426 nothing about --- such as an extension type --- it looks in two places
427 for a hint of how to pickle it. One alternative is for the object to
428 implement a \method{__reduce__()} method. If provided, at pickling
429 time \method{__reduce__()} will be called with no arguments, and it
430 must return either a string or a tuple.
432 If a string is returned, it names a global variable whose contents are
433 pickled as normal. When a tuple is returned, it must be of length two
434 or three, with the following semantics:
436 \begin{itemize}
438 \item A callable object, which in the unpickling environment must be
439 either a class, a callable registered as a ``safe constructor''
440 (see below), or it must have an attribute
441 \member{__safe_for_unpickling__} with a true value. Otherwise,
442 an \exception{UnpicklingError} will be raised in the unpickling
443 environment. Note that as usual, the callable itself is pickled
444 by name.
446 \item A tuple of arguments for the callable object, or \code{None}.
447 \deprecated{2.3}{Use the tuple of arguments instead}
449 \item Optionally, the object's state, which will be passed to
450 the object's \method{__setstate__()} method as described in
451 section~\ref{pickle-inst}. If the object has no
452 \method{__setstate__()} method, then, as above, the value must
453 be a dictionary and it will be added to the object's
454 \member{__dict__}.
456 \end{itemize}
458 Upon unpickling, the callable will be called (provided that it meets
459 the above criteria), passing in the tuple of arguments; it should
460 return the unpickled object.
462 If the second item was \code{None}, then instead of calling the
463 callable directly, its \method{__basicnew__()} method is called
464 without arguments. It should also return the unpickled object.
466 \deprecated{2.3}{Use the tuple of arguments instead}
468 An alternative to implementing a \method{__reduce__()} method on the
469 object to be pickled, is to register the callable with the
470 \refmodule[copyreg]{copy_reg} module. This module provides a way
471 for programs to register ``reduction functions'' and constructors for
472 user-defined types. Reduction functions have the same semantics and
473 interface as the \method{__reduce__()} method described above, except
474 that they are called with a single argument, the object to be pickled.
476 The registered constructor is deemed a ``safe constructor'' for purposes
477 of unpickling as described above.
479 \subsubsection{Pickling and unpickling external objects}
481 For the benefit of object persistence, the \module{pickle} module
482 supports the notion of a reference to an object outside the pickled
483 data stream. Such objects are referenced by a ``persistent id'',
484 which is just an arbitrary string of printable \ASCII{} characters.
485 The resolution of such names is not defined by the \module{pickle}
486 module; it will delegate this resolution to user defined functions on
487 the pickler and unpickler\footnote{The actual mechanism for
488 associating these user defined functions is slightly different for
489 \module{pickle} and \module{cPickle}. The description given here
490 works the same for both implementations. Users of the \module{pickle}
491 module could also use subclassing to effect the same results,
492 overriding the \method{persistent_id()} and \method{persistent_load()}
493 methods in the derived classes.}.
495 To define external persistent id resolution, you need to set the
496 \member{persistent_id} attribute of the pickler object and the
497 \member{persistent_load} attribute of the unpickler object.
499 To pickle objects that have an external persistent id, the pickler
500 must have a custom \function{persistent_id()} method that takes an
501 object as an argument and returns either \code{None} or the persistent
502 id for that object. When \code{None} is returned, the pickler simply
503 pickles the object as normal. When a persistent id string is
504 returned, the pickler will pickle that string, along with a marker
505 so that the unpickler will recognize the string as a persistent id.
507 To unpickle external objects, the unpickler must have a custom
508 \function{persistent_load()} function that takes a persistent id
509 string and returns the referenced object.
511 Here's a silly example that \emph{might} shed more light:
513 \begin{verbatim}
514 import pickle
515 from cStringIO import StringIO
517 src = StringIO()
518 p = pickle.Pickler(src)
520 def persistent_id(obj):
521 if hasattr(obj, 'x'):
522 return 'the value %d' % obj.x
523 else:
524 return None
526 p.persistent_id = persistent_id
528 class Integer:
529 def __init__(self, x):
530 self.x = x
531 def __str__(self):
532 return 'My name is integer %d' % self.x
534 i = Integer(7)
535 print i
536 p.dump(i)
538 datastream = src.getvalue()
539 print repr(datastream)
540 dst = StringIO(datastream)
542 up = pickle.Unpickler(dst)
544 class FancyInteger(Integer):
545 def __str__(self):
546 return 'I am the integer %d' % self.x
548 def persistent_load(persid):
549 if persid.startswith('the value '):
550 value = int(persid.split()[2])
551 return FancyInteger(value)
552 else:
553 raise pickle.UnpicklingError, 'Invalid persistent id'
555 up.persistent_load = persistent_load
557 j = up.load()
558 print j
559 \end{verbatim}
561 In the \module{cPickle} module, the unpickler's
562 \member{persistent_load} attribute can also be set to a Python
563 list, in which case, when the unpickler reaches a persistent id, the
564 persistent id string will simply be appended to this list. This
565 functionality exists so that a pickle data stream can be ``sniffed''
566 for object references without actually instantiating all the objects
567 in a pickle\footnote{We'll leave you with the image of Guido and Jim
568 sitting around sniffing pickles in their living rooms.}. Setting
569 \member{persistent_load} to a list is usually used in conjunction with
570 the \method{noload()} method on the Unpickler.
572 % BAW: Both pickle and cPickle support something called
573 % inst_persistent_id() which appears to give unknown types a second
574 % shot at producing a persistent id. Since Jim Fulton can't remember
575 % why it was added or what it's for, I'm leaving it undocumented.
577 \subsection{Security \label{pickle-sec}}
579 Most of the security issues surrounding the \module{pickle} and
580 \module{cPickle} module involve unpickling. There are no known
581 security vulnerabilities
582 related to pickling because you (the programmer) control the objects
583 that \module{pickle} will interact with, and all it produces is a
584 string.
586 However, for unpickling, it is \strong{never} a good idea to unpickle
587 an untrusted string whose origins are dubious, for example, strings
588 read from a socket. This is because unpickling can create unexpected
589 objects and even potentially run methods of those objects, such as
590 their class constructor or destructor\footnote{A special note of
591 caution is worth raising about the \refmodule{Cookie}
592 module. By default, the \class{Cookie.Cookie} class is an alias for
593 the \class{Cookie.SmartCookie} class, which ``helpfully'' attempts to
594 unpickle any cookie data string it is passed. This is a huge security
595 hole because cookie data typically comes from an untrusted source.
596 You should either explicitly use the \class{Cookie.SimpleCookie} class
597 --- which doesn't attempt to unpickle its string --- or you should
598 implement the defensive programming steps described later on in this
599 section.}.
601 You can defend against this by customizing your unpickler so that you
602 can control exactly what gets unpickled and what gets called.
603 Unfortunately, exactly how you do this is different depending on
604 whether you're using \module{pickle} or \module{cPickle}.
606 One common feature that both modules implement is the
607 \member{__safe_for_unpickling__} attribute. Before calling a callable
608 which is not a class, the unpickler will check to make sure that the
609 callable has either been registered as a safe callable via the
610 \refmodule[copyreg]{copy_reg} module, or that it has an
611 attribute \member{__safe_for_unpickling__} with a true value. This
612 prevents the unpickling environment from being tricked into doing
613 evil things like call \code{os.unlink()} with an arbitrary file name.
614 See section~\ref{pickle-protocol} for more details.
616 For safely unpickling class instances, you need to control exactly
617 which classes will get created. Be aware that a class's constructor
618 could be called (if the pickler found a \method{__getinitargs__()}
619 method) and the the class's destructor (i.e. its \method{__del__()} method)
620 might get called when the object is garbage collected. Depending on
621 the class, it isn't very heard to trick either method into doing bad
622 things, such as removing a file. The way to
623 control the classes that are safe to instantiate differs in
624 \module{pickle} and \module{cPickle}\footnote{A word of caution: the
625 mechanisms described here use internal attributes and methods, which
626 are subject to change in future versions of Python. We intend to
627 someday provide a common interface for controlling this behavior,
628 which will work in either \module{pickle} or \module{cPickle}.}.
630 In the \module{pickle} module, you need to derive a subclass from
631 \class{Unpickler}, overriding the \method{load_global()}
632 method. \method{load_global()} should read two lines from the pickle
633 data stream where the first line will the the name of the module
634 containing the class and the second line will be the name of the
635 instance's class. It then look up the class, possibly importing the
636 module and digging out the attribute, then it appends what it finds to
637 the unpickler's stack. Later on, this class will be assigned to the
638 \member{__class__} attribute of an empty class, as a way of magically
639 creating an instance without calling its class's \method{__init__()}.
640 You job (should you choose to accept it), would be to have
641 \method{load_global()} push onto the unpickler's stack, a known safe
642 version of any class you deem safe to unpickle. It is up to you to
643 produce such a class. Or you could raise an error if you want to
644 disallow all unpickling of instances. If this sounds like a hack,
645 you're right. UTSL.
647 Things are a little cleaner with \module{cPickle}, but not by much.
648 To control what gets unpickled, you can set the unpickler's
649 \member{find_global} attribute to a function or \code{None}. If it is
650 \code{None} then any attempts to unpickle instances will raise an
651 \exception{UnpicklingError}. If it is a function,
652 then it should accept a module name and a class name, and return the
653 corresponding class object. It is responsible for looking up the
654 class, again performing any necessary imports, and it may raise an
655 error to prevent instances of the class from being unpickled.
657 The moral of the story is that you should be really careful about the
658 source of the strings your application unpickles.
660 \subsection{Example \label{pickle-example}}
662 Here's a simple example of how to modify pickling behavior for a
663 class. The \class{TextReader} class opens a text file, and returns
664 the line number and line contents each time its \method{readline()}
665 method is called. If a \class{TextReader} instance is pickled, all
666 attributes \emph{except} the file object member are saved. When the
667 instance is unpickled, the file is reopened, and reading resumes from
668 the last location. The \method{__setstate__()} and
669 \method{__getstate__()} methods are used to implement this behavior.
671 \begin{verbatim}
672 class TextReader:
673 """Print and number lines in a text file."""
674 def __init__(self, file):
675 self.file = file
676 self.fh = open(file)
677 self.lineno = 0
679 def readline(self):
680 self.lineno = self.lineno + 1
681 line = self.fh.readline()
682 if not line:
683 return None
684 if line.endswith("\n"):
685 line = line[:-1]
686 return "%d: %s" % (self.lineno, line)
688 def __getstate__(self):
689 odict = self.__dict__.copy() # copy the dict since we change it
690 del odict['fh'] # remove filehandle entry
691 return odict
693 def __setstate__(self,dict):
694 fh = open(dict['file']) # reopen file
695 count = dict['lineno'] # read from file...
696 while count: # until line count is restored
697 fh.readline()
698 count = count - 1
699 self.__dict__.update(dict) # update attributes
700 self.fh = fh # save the file object
701 \end{verbatim}
703 A sample usage might be something like this:
705 \begin{verbatim}
706 >>> import TextReader
707 >>> obj = TextReader.TextReader("TextReader.py")
708 >>> obj.readline()
709 '1: #!/usr/local/bin/python'
710 >>> # (more invocations of obj.readline() here)
711 ... obj.readline()
712 '7: class TextReader:'
713 >>> import pickle
714 >>> pickle.dump(obj,open('save.p','w'))
715 \end{verbatim}
717 If you want to see that \refmodule{pickle} works across Python
718 processes, start another Python session, before continuing. What
719 follows can happen from either the same process or a new process.
721 \begin{verbatim}
722 >>> import pickle
723 >>> reader = pickle.load(open('save.p'))
724 >>> reader.readline()
725 '8: "Print and number lines in a text file."'
726 \end{verbatim}
729 \begin{seealso}
730 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
731 registration for extension types.}
733 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
735 \seemodule{copy}{Shallow and deep object copying.}
737 \seemodule{marshal}{High-performance serialization of built-in types.}
738 \end{seealso}
741 \section{\module{cPickle} --- A faster \module{pickle}}
743 \declaremodule{builtin}{cPickle}
744 \modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
745 \moduleauthor{Jim Fulton}{jfulton@digicool.com}
746 \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
748 The \module{cPickle} module supports serialization and
749 de-serialization of Python objects, providing an interface and
750 functionality nearly identical to the
751 \refmodule{pickle}\refstmodindex{pickle} module. There are several
752 differences, the most important being performance and subclassability.
754 First, \module{cPickle} can be up to 1000 times faster than
755 \module{pickle} because the former is implemented in C. Second, in
756 the \module{cPickle} module the callables \function{Pickler()} and
757 \function{Unpickler()} are functions, not classes. This means that
758 you cannot use them to derive custom pickling and unpickling
759 subclasses. Most applications have no need for this functionality and
760 should benefit from the greatly improved performance of the
761 \module{cPickle} module.
763 The pickle data stream produced by \module{pickle} and
764 \module{cPickle} are identical, so it is possible to use
765 \module{pickle} and \module{cPickle} interchangeably with existing
766 pickles\footnote{Since the pickle data format is actually a tiny
767 stack-oriented programming language, and some freedom is taken in the
768 encodings of certain objects, it is possible that the two modules
769 produce different data streams for the same input objects. However it
770 is guaranteed that they will always be able to read each other's
771 data streams.}.
773 There are additional minor differences in API between \module{cPickle}
774 and \module{pickle}, however for most applications, they are
775 interchangable. More documentation is provided in the
776 \module{pickle} module documentation, which
777 includes a list of the documented differences.