Clarify portability and main program.
[python/dscho.git] / Doc / lib / libpickle.tex
blob6f9ece7c6c7c1860f4b3e2dc035b12769d5f3d04
1 \section{\module{pickle} ---
2 Python object serialization.}
3 \declaremodule{standard}{pickle}
5 \modulesynopsis{Convert Python objects to streams of bytes and back.}
7 \index{persistency}
8 \indexii{persistent}{objects}
9 \indexii{serializing}{objects}
10 \indexii{marshalling}{objects}
11 \indexii{flattening}{objects}
12 \indexii{pickling}{objects}
15 The \module{pickle} module implements a basic but powerful algorithm for
16 ``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
17 arbitrary Python objects. This is the act of converting objects to a
18 stream of bytes (and back: ``unpickling'').
19 This is a more primitive notion than
20 persistency --- although \module{pickle} reads and writes file objects,
21 it does not handle the issue of naming persistent objects, nor the
22 (even more complicated) area of concurrent access to persistent
23 objects. The \module{pickle} module can transform a complex object into
24 a byte stream and it can transform the byte stream into an object with
25 the same internal structure. The most obvious thing to do with these
26 byte streams is to write them onto a file, but it is also conceivable
27 to send them across a network or store them in a database. The module
28 \module{shelve} provides a simple interface to pickle and unpickle
29 objects on ``dbm''-style database files.
30 \refstmodindex{shelve}
32 \strong{Note:} The \module{pickle} module is rather slow. A
33 reimplementation of the same algorithm in \C{}, which is up to 1000 times
34 faster, is available as the \module{cPickle}\refbimodindex{cPickle}
35 module. This has the same interface except that \code{Pickler} and
36 \code{Unpickler} are factory functions, not classes (so they cannot be
37 used as base classes for inheritance).
39 Unlike the built-in module \module{marshal}, \module{pickle} handles
40 the following correctly:
41 \refbimodindex{marshal}
43 \begin{itemize}
45 \item recursive objects (objects containing references to themselves)
47 \item object sharing (references to the same object in different places)
49 \item user-defined classes and their instances
51 \end{itemize}
53 The data format used by \module{pickle} is Python-specific. This has
54 the advantage that there are no restrictions imposed by external
55 standards such as XDR%
56 \index{XDR}
57 \index{External Data Representation}
58 (which can't represent pointer sharing); however
59 it means that non-Python programs may not be able to reconstruct
60 pickled Python objects.
62 By default, the \module{pickle} data format uses a printable \ASCII{}
63 representation. This is slightly more voluminous than a binary
64 representation. The big advantage of using printable \ASCII{} (and of
65 some other characteristics of \module{pickle}'s representation) is that
66 for debugging or recovery purposes it is possible for a human to read
67 the pickled file with a standard text editor.
69 A binary format, which is slightly more efficient, can be chosen by
70 specifying a nonzero (true) value for the \var{bin} argument to the
71 \class{Pickler} constructor or the \function{dump()} and \function{dumps()}
72 functions. The binary format is not the default because of backwards
73 compatibility with the Python 1.4 pickle module. In a future version,
74 the default may change to binary.
76 The \module{pickle} module doesn't handle code objects, which the
77 \module{marshal} module does. I suppose \module{pickle} could, and maybe
78 it should, but there's probably no great need for it right now (as
79 long as \module{marshal} continues to be used for reading and writing
80 code objects), and at least this avoids the possibility of smuggling
81 Trojan horses into a program.
82 \refbimodindex{marshal}
84 For the benefit of persistency modules written using \module{pickle}, it
85 supports the notion of a reference to an object outside the pickled
86 data stream. Such objects are referenced by a name, which is an
87 arbitrary string of printable \ASCII{} characters. The resolution of
88 such names is not defined by the \module{pickle} module --- the
89 persistent object module will have to implement a method
90 \method{persistent_load()}. To write references to persistent objects,
91 the persistent module must define a method \method{persistent_id()} which
92 returns either \code{None} or the persistent ID of the object.
94 There are some restrictions on the pickling of class instances.
96 First of all, the class must be defined at the top level in a module.
97 Furthermore, all its instance variables must be picklable.
99 \setindexsubitem{(pickle protocol)}
101 When a pickled class instance is unpickled, its \method{__init__()} method
102 is normally \emph{not} invoked. \strong{Note:} This is a deviation
103 from previous versions of this module; the change was introduced in
104 Python 1.5b2. The reason for the change is that in many cases it is
105 desirable to have a constructor that requires arguments; it is a
106 (minor) nuisance to have to provide a \method{__getinitargs__()} method.
108 If it is desirable that the \method{__init__()} method be called on
109 unpickling, a class can define a method \method{__getinitargs__()},
110 which should return a \emph{tuple} containing the arguments to be
111 passed to the class constructor (\method{__init__()}). This method is
112 called at pickle time; the tuple it returns is incorporated in the
113 pickle for the instance.
114 \ttindex{__getinitargs__()}
115 \ttindex{__init__()}
117 Classes can further influence how their instances are pickled --- if the class
118 defines the method \method{__getstate__()}, it is called and the return
119 state is pickled as the contents for the instance, and if the class
120 defines the method \method{__setstate__()}, it is called with the
121 unpickled state. (Note that these methods can also be used to
122 implement copying class instances.) If there is no
123 \method{__getstate__()} method, the instance's \member{__dict__} is
124 pickled. If there is no \method{__setstate__()} method, the pickled
125 object must be a dictionary and its items are assigned to the new
126 instance's dictionary. (If a class defines both \method{__getstate__()}
127 and \method{__setstate__()}, the state object needn't be a dictionary
128 --- these methods can do what they want.) This protocol is also used
129 by the shallow and deep copying operations defined in the \module{copy}
130 module.\refstmodindex{copy}
131 \ttindex{__getstate__()}
132 \ttindex{__setstate__()}
133 \ttindex{__dict__}
135 Note that when class instances are pickled, their class's code and
136 data are not pickled along with them. Only the instance data are
137 pickled. This is done on purpose, so you can fix bugs in a class or
138 add methods and still load objects that were created with an earlier
139 version of the class. If you plan to have long-lived objects that
140 will see many versions of a class, it may be worthwhile to put a version
141 number in the objects so that suitable conversions can be made by the
142 class's \method{__setstate__()} method.
144 When a class itself is pickled, only its name is pickled --- the class
145 definition is not pickled, but re-imported by the unpickling process.
146 Therefore, the restriction that the class must be defined at the top
147 level in a module applies to pickled classes as well.
149 \setindexsubitem{(in module pickle)}
151 The interface can be summarized as follows.
153 To pickle an object \code{x} onto a file \code{f}, open for writing:
155 \begin{verbatim}
156 p = pickle.Pickler(f)
157 p.dump(x)
158 \end{verbatim}
160 A shorthand for this is:
162 \begin{verbatim}
163 pickle.dump(x, f)
164 \end{verbatim}
166 To unpickle an object \code{x} from a file \code{f}, open for reading:
168 \begin{verbatim}
169 u = pickle.Unpickler(f)
170 x = u.load()
171 \end{verbatim}
173 A shorthand is:
175 \begin{verbatim}
176 x = pickle.load(f)
177 \end{verbatim}
179 The \class{Pickler} class only calls the method \code{f.write()} with a
180 string argument. The \class{Unpickler} calls the methods \code{f.read()}
181 (with an integer argument) and \code{f.readline()} (without argument),
182 both returning a string. It is explicitly allowed to pass non-file
183 objects here, as long as they have the right methods.
184 \ttindex{Unpickler}
185 \ttindex{Pickler}
187 The constructor for the \class{Pickler} class has an optional second
188 argument, \var{bin}. If this is present and nonzero, the binary
189 pickle format is used; if it is zero or absent, the (less efficient,
190 but backwards compatible) text pickle format is used. The
191 \class{Unpickler} class does not have an argument to distinguish
192 between binary and text pickle formats; it accepts either format.
194 The following types can be pickled:
195 \begin{itemize}
197 \item \code{None}
199 \item integers, long integers, floating point numbers
201 \item strings
203 \item tuples, lists and dictionaries containing only picklable objects
205 \item classes that are defined at the top level in a module
207 \item instances of such classes whose \member{__dict__} or
208 \method{__setstate__()} is picklable
210 \end{itemize}
212 Attempts to pickle unpicklable objects will raise the
213 \exception{PicklingError} exception; when this happens, an unspecified
214 number of bytes may have been written to the file.
216 It is possible to make multiple calls to the \method{dump()} method of
217 the same \class{Pickler} instance. These must then be matched to the
218 same number of calls to the \method{load()} method of the
219 corresponding \class{Unpickler} instance. If the same object is
220 pickled by multiple \method{dump()} calls, the \method{load()} will all
221 yield references to the same object. \emph{Warning}: this is intended
222 for pickling multiple objects without intervening modifications to the
223 objects or their parts. If you modify an object and then pickle it
224 again using the same \class{Pickler} instance, the object is not
225 pickled again --- a reference to it is pickled and the
226 \class{Unpickler} will return the old value, not the modified one.
227 (There are two problems here: (a) detecting changes, and (b)
228 marshalling a minimal set of changes. I have no answers. Garbage
229 Collection may also become a problem here.)
231 Apart from the \class{Pickler} and \class{Unpickler} classes, the
232 module defines the following functions, and an exception:
234 \begin{funcdesc}{dump}{object, file\optional{, bin}}
235 Write a pickled representation of \var{obect} to the open file object
236 \var{file}. This is equivalent to
237 \samp{Pickler(\var{file}, \var{bin}).dump(\var{object})}.
238 If the optional \var{bin} argument is present and nonzero, the binary
239 pickle format is used; if it is zero or absent, the (less efficient)
240 text pickle format is used.
241 \end{funcdesc}
243 \begin{funcdesc}{load}{file}
244 Read a pickled object from the open file object \var{file}. This is
245 equivalent to \samp{Unpickler(\var{file}).load()}.
246 \end{funcdesc}
248 \begin{funcdesc}{dumps}{object\optional{, bin}}
249 Return the pickled representation of the object as a string, instead
250 of writing it to a file. If the optional \var{bin} argument is
251 present and nonzero, the binary pickle format is used; if it is zero
252 or absent, the (less efficient) text pickle format is used.
253 \end{funcdesc}
255 \begin{funcdesc}{loads}{string}
256 Read a pickled object from a string instead of a file. Characters in
257 the string past the pickled object's representation are ignored.
258 \end{funcdesc}
260 \begin{excdesc}{PicklingError}
261 This exception is raised when an unpicklable object is passed to
262 \code{Pickler.dump()}.
263 \end{excdesc}
266 \begin{seealso}
267 \seemodule[copyreg]{copy_reg}{pickle interface constructor
268 registration}
270 \seemodule{shelve}{indexed databases of objects; uses \module{pickle}}
272 \seemodule{copy}{shallow and deep object copying}
274 \seemodule{marshal}{high-performance serialization of built-in types}
275 \end{seealso}
278 \section{\module{cPickle} ---
279 Alternate implementation of \module{pickle}.}
280 \declaremodule{builtin}{cPickle}
282 \modulesynopsis{Faster version of \module{pickle}, but not subclassable.}
285 % This section was written by Fred L. Drake, Jr. <fdrake@acm.org>
287 The \module{cPickle} module provides a similar interface and identical
288 functionality as the \module{pickle} module, but can be up to 1000
289 times faster since it is implemented in \C{}. The only other
290 important difference to note is that \function{Pickler()} and
291 \function{Unpickler()} are functions and not classes, and so cannot be
292 subclassed. This should not be an issue in most cases.
294 The format of the pickle data is identical to that produced using the
295 \module{pickle} module, so it is possible to use \module{pickle} and
296 \module{cPickle} interchangably with existing pickles.