This commit was manufactured by cvs2svn to create tag 'cnrisync'.
[python/dscho.git] / Doc / libpickle.tex
blob579992f049d5ad9fbd2604cc376a64a257047fa8
1 \section{Standard Module \sectcode{pickle}}
2 \stmodindex{pickle}
3 \index{persistency}
4 \indexii{persistent}{objects}
5 \indexii{serializing}{objects}
6 \indexii{marshalling}{objects}
7 \indexii{flattening}{objects}
8 \indexii{pickling}{objects}
10 \renewcommand{\indexsubitem}{(in module pickle)}
12 The \code{pickle} module implements a basic but powerful algorithm for
13 ``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
14 arbitrary Python objects. This is the act of converting objects to a
15 stream of bytes (and back: ``unpickling'').
16 This is a more primitive notion than
17 persistency --- although \code{pickle} reads and writes file objects,
18 it does not handle the issue of naming persistent objects, nor the
19 (even more complicated) area of concurrent access to persistent
20 objects. The \code{pickle} module can transform a complex object into
21 a byte stream and it can transform the byte stream into an object with
22 the same internal structure. The most obvious thing to do with these
23 byte streams is to write them onto a file, but it is also conceivable
24 to send them across a network or store them in a database. The module
25 \code{shelve} provides a simple interface to pickle and unpickle
26 objects on ``dbm''-style database files.
27 \stmodindex{shelve}
29 Unlike the built-in module \code{marshal}, \code{pickle} handles the
30 following correctly:
31 \stmodindex{marshal}
33 \begin{itemize}
35 \item recursive objects (objects containing references to themselves)
37 \item object sharing (references to the same object in different places)
39 \item user-defined classes and their instances
41 \end{itemize}
43 The data format used by \code{pickle} is Python-specific. This has
44 the advantage that there are no restrictions imposed by external
45 standards such as CORBA (which probably can't represent pointer
46 sharing or recursive objects); however it means that non-Python
47 programs may not be able to reconstruct pickled Python objects.
49 The \code{pickle} data format uses a printable \ASCII{} representation.
50 This is slightly more voluminous than a binary representation.
51 However, small integers actually take {\em less} space when
52 represented as minimal-size decimal strings than when represented as
53 32-bit binary numbers, and strings are only much longer if they
54 contain many control characters or 8-bit characters. The big
55 advantage of using printable \ASCII{} (and of some other characteristics
56 of \code{pickle}'s representation) is that for debugging or recovery
57 purposes it is possible for a human to read the pickled file with a
58 standard text editor. (I could have gone a step further and used a
59 notation like S-expressions, but the parser
60 (currently written in Python) would have been
61 considerably more complicated and slower, and the files would probably
62 have become much larger.)
64 The \code{pickle} module doesn't handle code objects, which the
65 \code{marshal} module does. I suppose \code{pickle} could, and maybe
66 it should, but there's probably no great need for it right now (as
67 long as \code{marshal} continues to be used for reading and writing
68 code objects), and at least this avoids the possibility of smuggling
69 Trojan horses into a program.
70 \stmodindex{marshal}
72 For the benefit of persistency modules written using \code{pickle}, it
73 supports the notion of a reference to an object outside the pickled
74 data stream. Such objects are referenced by a name, which is an
75 arbitrary string of printable \ASCII{} characters. The resolution of
76 such names is not defined by the \code{pickle} module --- the
77 persistent object module will have to implement a method
78 \code{persistent_load}. To write references to persistent objects,
79 the persistent module must define a method \code{persistent_id} which
80 returns either \code{None} or the persistent ID of the object.
82 There are some restrictions on the pickling of class instances.
84 First of all, the class must be defined at the top level in a module.
86 \renewcommand{\indexsubitem}{(pickle protocol)}
88 Next, it must normally be possible to create class instances by
89 calling the class without arguments. If this is undesirable, the
90 class can define a method \code{__getinitargs__()}, which should
91 return a {\em tuple} containing the arguments to be passed to the
92 class constructor (\code{__init__()}).
93 \ttindex{__getinitargs__}
94 \ttindex{__init__}
96 Classes can further influence how their instances are pickled --- if the class
97 defines the method \code{__getstate__()}, it is called and the return
98 state is pickled as the contents for the instance, and if the class
99 defines the method \code{__setstate__()}, it is called with the
100 unpickled state. (Note that these methods can also be used to
101 implement copying class instances.) If there is no
102 \code{__getstate__()} method, the instance's \code{__dict__} is
103 pickled. If there is no \code{__setstate__()} method, the pickled
104 object must be a dictionary and its items are assigned to the new
105 instance's dictionary. (If a class defines both \code{__getstate__()}
106 and \code{__setstate__()}, the state object needn't be a dictionary
107 --- these methods can do what they want.) This protocol is also used
108 by the shallow and deep copying operations defined in the \code{copy}
109 module.
110 \ttindex{__getstate__}
111 \ttindex{__setstate__}
112 \ttindex{__dict__}
114 Note that when class instances are pickled, their class's code and
115 data are not pickled along with them. Only the instance data are
116 pickled. This is done on purpose, so you can fix bugs in a class or
117 add methods and still load objects that were created with an earlier
118 version of the class. If you plan to have long-lived objects that
119 will see many versions of a class, it may be worthwhile to put a version
120 number in the objects so that suitable conversions can be made by the
121 class's \code{__setstate__()} method.
123 When a class itself is pickled, only its name is pickled --- the class
124 definition is not pickled, but re-imported by the unpickling process.
125 Therefore, the restriction that the class must be defined at the top
126 level in a module applies to pickled classes as well.
128 \renewcommand{\indexsubitem}{(in module pickle)}
130 The interface can be summarized as follows.
132 To pickle an object \code{x} onto a file \code{f}, open for writing:
134 \begin{verbatim}
135 p = pickle.Pickler(f)
136 p.dump(x)
137 \end{verbatim}
139 A shorthand for this is:
141 \begin{verbatim}
142 pickle.dump(x, f)
143 \end{verbatim}
145 To unpickle an object \code{x} from a file \code{f}, open for reading:
147 \begin{verbatim}
148 u = pickle.Unpickler(f)
149 x = u.load()
150 \end{verbatim}
152 A shorthand is:
154 \begin{verbatim}
155 x = pickle.load(f)
156 \end{verbatim}
158 The \code{Pickler} class only calls the method \code{f.write} with a
159 string argument. The \code{Unpickler} calls the methods \code{f.read}
160 (with an integer argument) and \code{f.readline} (without argument),
161 both returning a string. It is explicitly allowed to pass non-file
162 objects here, as long as they have the right methods.
163 \ttindex{Unpickler}
164 \ttindex{Pickler}
166 The following types can be pickled:
167 \begin{itemize}
169 \item \code{None}
171 \item integers, long integers, floating point numbers
173 \item strings
175 \item tuples, lists and dictionaries containing only picklable objects
177 \item classes that are defined at the top level in a module
179 \item instances of such classes whose \code{__dict__} or
180 \code{__setstate__()} is picklable
182 \end{itemize}
184 Attempts to pickle unpicklable objects will raise the
185 \code{PicklingError} exception; when this happens, an unspecified
186 number of bytes may have been written to the file.
188 It is possible to make multiple calls to the \code{dump()} method of
189 the same \code{Pickler} instance. These must then be matched to the
190 same number of calls to the \code{load()} instance of the
191 corresponding \code{Unpickler} instance. If the same object is
192 pickled by multiple \code{dump()} calls, the \code{load()} will all
193 yield references to the same object. {\em Warning}: this is intended
194 for pickling multiple objects without intervening modifications to the
195 objects or their parts. If you modify an object and then pickle it
196 again using the same \code{Pickler} instance, the object is not
197 pickled again --- a reference to it is pickled and the
198 \code{Unpickler} will return the old value, not the modified one.
199 (There are two problems here: (a) detecting changes, and (b)
200 marshalling a minimal set of changes. I have no answers. Garbage
201 Collection may also become a problem here.)
203 Apart from the \code{Pickler} and \code{Unpickler} classes, the
204 module defines the following functions, and an exception:
206 \begin{funcdesc}{dump}{object\, file}
207 Write a pickled representation of \var{obect} to the open file object
208 \var{file}. This is equivalent to \code{Pickler(file).dump(object)}.
209 \end{funcdesc}
211 \begin{funcdesc}{load}{file}
212 Read a pickled object from the open file object \var{file}. This is
213 equivalent to \code{Unpickler(file).load()}.
214 \end{funcdesc}
216 \begin{funcdesc}{dumps}{object}
217 Return the pickled representation of the object as a string, instead
218 of writing it to a file.
219 \end{funcdesc}
221 \begin{funcdesc}{loads}{string}
222 Read a pickled object from a string instead of a file. Characters in
223 the string past the pickled object's representation are ignored.
224 \end{funcdesc}
226 \begin{excdesc}{PicklingError}
227 This exception is raised when an unpicklable object is passed to
228 \code{Pickler.dump()}.
229 \end{excdesc}