1 \section{\module{pickle
} ---
2 Python object serialization
}
4 \declaremodule{standard
}{pickle
}
5 \modulesynopsis{Convert Python objects to streams of bytes and back.
}
6 % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
9 \indexii{persistent
}{objects
}
10 \indexii{serializing
}{objects
}
11 \indexii{marshalling
}{objects
}
12 \indexii{flattening
}{objects
}
13 \indexii{pickling
}{objects
}
16 The
\module{pickle
} module implements a basic but powerful algorithm
17 for ``pickling'' (a.k.a.\ serializing, marshalling or flattening)
18 nearly arbitrary Python objects. This is the act of converting
19 objects to a stream of bytes (and back: ``unpickling''). This is a
20 more primitive notion than persistence --- although
\module{pickle
}
21 reads and writes file objects, it does not handle the issue of naming
22 persistent objects, nor the (even more complicated) area of concurrent
23 access to persistent objects. The
\module{pickle
} module can
24 transform a complex object into a byte stream and it can transform the
25 byte stream into an object with the same internal structure. The most
26 obvious thing to do with these byte streams is to write them onto a
27 file, but it is also conceivable to send them across a network or
28 store them in a database. The module
29 \refmodule{shelve
}\refstmodindex{shelve
} provides a simple interface
30 to pickle and unpickle objects on DBM-style database files.
33 \strong{Note:
} The
\module{pickle
} module is rather slow. A
34 reimplementation of the same algorithm in C, which is up to
1000 times
35 faster, is available as the
36 \refmodule{cPickle
}\refbimodindex{cPickle
} module. This has the same
37 interface except that
\class{Pickler
} and
\class{Unpickler
} are
38 factory functions, not classes (so they cannot be used as base classes
41 Although the
\module{pickle
} module can use the built-in module
42 \refmodule{marshal
}\refbimodindex{marshal
} internally, it differs from
43 \refmodule{marshal
} in the way it handles certain kinds of data:
47 \item Recursive objects (objects containing references to themselves):
48 \module{pickle
} keeps track of the objects it has already
49 serialized, so later references to the same object won't be
50 serialized again. (The
\refmodule{marshal
} module breaks for
53 \item Object sharing (references to the same object in different
54 places): This is similar to self-referencing objects;
55 \module{pickle
} stores the object once, and ensures that all
56 other references point to the master copy. Shared objects
57 remain shared, which can be very important for mutable objects.
59 \item User-defined classes and their instances:
\refmodule{marshal
}
60 does not support these at all, but
\module{pickle
} can save
61 and restore class instances transparently. The class definition
62 must be importable and live in the same module as when the
67 The data format used by
\module{pickle
} is Python-specific. This has
68 the advantage that there are no restrictions imposed by external
70 XDR
\index{XDR
}\index{External Data Representation
} (which can't
71 represent pointer sharing); however it means that non-Python programs
72 may not be able to reconstruct pickled Python objects.
74 By default, the
\module{pickle
} data format uses a printable
\ASCII{}
75 representation. This is slightly more voluminous than a binary
76 representation. The big advantage of using printable
\ASCII{} (and of
77 some other characteristics of
\module{pickle
}'s representation) is that
78 for debugging or recovery purposes it is possible for a human to read
79 the pickled file with a standard text editor.
81 A binary format, which is slightly more efficient, can be chosen by
82 specifying a nonzero (true) value for the
\var{bin
} argument to the
83 \class{Pickler
} constructor or the
\function{dump()
} and
\function{dumps()
}
84 functions. The binary format is not the default because of backwards
85 compatibility with the Python
1.4 pickle module. In a future version,
86 the default may change to binary.
88 The
\module{pickle
} module doesn't handle code objects, which the
89 \refmodule{marshal
}\refbimodindex{marshal
} module does. I suppose
90 \module{pickle
} could, and maybe it should, but there's probably no
91 great need for it right now (as long as
\refmodule{marshal
} continues
92 to be used for reading and writing code objects), and at least this
93 avoids the possibility of smuggling Trojan horses into a program.
95 For the benefit of persistence modules written using
\module{pickle
}, it
96 supports the notion of a reference to an object outside the pickled
97 data stream. Such objects are referenced by a name, which is an
98 arbitrary string of printable
\ASCII{} characters. The resolution of
99 such names is not defined by the
\module{pickle
} module --- the
100 persistent object module will have to implement a method
101 \method{persistent_load()
}. To write references to persistent objects,
102 the persistent module must define a method
\method{persistent_id()
} which
103 returns either
\code{None
} or the persistent ID of the object.
105 There are some restrictions on the pickling of class instances.
107 First of all, the class must be defined at the top level in a module.
108 Furthermore, all its instance variables must be picklable.
110 \setindexsubitem{(pickle protocol)
}
112 When a pickled class instance is unpickled, its
\method{__init__()
} method
113 is normally
\emph{not
} invoked.
\strong{Note:
} This is a deviation
114 from previous versions of this module; the change was introduced in
115 Python
1.5b2. The reason for the change is that in many cases it is
116 desirable to have a constructor that requires arguments; it is a
117 (minor) nuisance to have to provide a
\method{__getinitargs__()
} method.
119 If it is desirable that the
\method{__init__()
} method be called on
120 unpickling, a class can define a method
\method{__getinitargs__()
},
121 which should return a
\emph{tuple
} containing the arguments to be
122 passed to the class constructor (
\method{__init__()
}). This method is
123 called at pickle time; the tuple it returns is incorporated in the
124 pickle for the instance.
125 \withsubitem{(copy protocol)
}{\ttindex{__getinitargs__()
}}
126 \withsubitem{(instance constructor)
}{\ttindex{__init__()
}}
128 Classes can further influence how their instances are pickled --- if
130 \withsubitem{(copy protocol)
}{
131 \ttindex{__getstate__()
}\ttindex{__setstate__()
}}
132 \withsubitem{(instance attribute)
}{
134 defines the method
\method{__getstate__()
}, it is called and the return
135 state is pickled as the contents for the instance, and if the class
136 defines the method
\method{__setstate__()
}, it is called with the
137 unpickled state. (Note that these methods can also be used to
138 implement copying class instances.) If there is no
139 \method{__getstate__()
} method, the instance's
\member{__dict__
} is
140 pickled. If there is no
\method{__setstate__()
} method, the pickled
141 object must be a dictionary and its items are assigned to the new
142 instance's dictionary. (If a class defines both
\method{__getstate__()
}
143 and
\method{__setstate__()
}, the state object needn't be a dictionary
144 --- these methods can do what they want.) This protocol is also used
145 by the shallow and deep copying operations defined in the
146 \refmodule{copy
}\refstmodindex{copy
} module.
148 Note that when class instances are pickled, their class's code and
149 data are not pickled along with them. Only the instance data are
150 pickled. This is done on purpose, so you can fix bugs in a class or
151 add methods and still load objects that were created with an earlier
152 version of the class. If you plan to have long-lived objects that
153 will see many versions of a class, it may be worthwhile to put a version
154 number in the objects so that suitable conversions can be made by the
155 class's
\method{__setstate__()
} method.
157 When a class itself is pickled, only its name is pickled --- the class
158 definition is not pickled, but re-imported by the unpickling process.
159 Therefore, the restriction that the class must be defined at the top
160 level in a module applies to pickled classes as well.
162 \setindexsubitem{(in module pickle)
}
164 The interface can be summarized as follows.
166 To pickle an object
\code{x
} onto a file
\code{f
}, open for writing:
169 p = pickle.Pickler(f)
173 A shorthand for this is:
179 To unpickle an object
\code{x
} from a file
\code{f
}, open for reading:
182 u = pickle.Unpickler(f)
192 The
\class{Pickler
} class only calls the method
\code{f.write()
} with a
193 \withsubitem{(class in pickle)
}{\ttindex{Unpickler
}\ttindex{Pickler
}}
194 string argument. The
\class{Unpickler
} calls the methods
\code{f.read()
}
195 (with an integer argument) and
\code{f.readline()
} (without argument),
196 both returning a string. It is explicitly allowed to pass non-file
197 objects here, as long as they have the right methods.
199 The constructor for the
\class{Pickler
} class has an optional second
200 argument,
\var{bin
}. If this is present and true, the binary
201 pickle format is used; if it is absent or false, the (less efficient,
202 but backwards compatible) text pickle format is used. The
203 \class{Unpickler
} class does not have an argument to distinguish
204 between binary and text pickle formats; it accepts either format.
206 The following types can be pickled:
212 \item integers, long integers, floating point numbers
214 \item normal and Unicode strings
216 \item tuples, lists and dictionaries containing only picklable objects
218 \item functions defined at the top level of a module (by name
219 reference, not storage of the implementation)
221 \item built-in functions
223 \item classes that are defined at the top level in a module
225 \item instances of such classes whose
\member{__dict__
} or
226 \method{__setstate__()
} is picklable
230 Attempts to pickle unpicklable objects will raise the
231 \exception{PicklingError
} exception; when this happens, an unspecified
232 number of bytes may have been written to the file.
234 It is possible to make multiple calls to the
\method{dump()
} method of
235 the same
\class{Pickler
} instance. These must then be matched to the
236 same number of calls to the
\method{load()
} method of the
237 corresponding
\class{Unpickler
} instance. If the same object is
238 pickled by multiple
\method{dump()
} calls, the
\method{load()
} will all
239 yield references to the same object.
\emph{Warning
}: this is intended
240 for pickling multiple objects without intervening modifications to the
241 objects or their parts. If you modify an object and then pickle it
242 again using the same
\class{Pickler
} instance, the object is not
243 pickled again --- a reference to it is pickled and the
244 \class{Unpickler
} will return the old value, not the modified one.
245 (There are two problems here: (a) detecting changes, and (b)
246 marshalling a minimal set of changes. I have no answers. Garbage
247 Collection may also become a problem here.)
249 Apart from the
\class{Pickler
} and
\class{Unpickler
} classes, the
250 module defines the following functions, and an exception:
252 \begin{funcdesc
}{dump
}{object, file
\optional{, bin
}}
253 Write a pickled representation of
\var{object
} to the open file object
254 \var{file
}. This is equivalent to
255 \samp{Pickler(
\var{file
},
\var{bin
}).dump(
\var{object
})
}.
256 If the optional
\var{bin
} argument is present and nonzero, the binary
257 pickle format is used; if it is zero or absent, the (less efficient)
258 text pickle format is used.
261 \begin{funcdesc
}{load
}{file
}
262 Read a pickled object from the open file object
\var{file
}. This is
263 equivalent to
\samp{Unpickler(
\var{file
}).load()
}.
266 \begin{funcdesc
}{dumps
}{object
\optional{, bin
}}
267 Return the pickled representation of the object as a string, instead
268 of writing it to a file. If the optional
\var{bin
} argument is
269 present and nonzero, the binary pickle format is used; if it is zero
270 or absent, the (less efficient) text pickle format is used.
273 \begin{funcdesc
}{loads
}{string
}
274 Read a pickled object from a string instead of a file. Characters in
275 the string past the pickled object's representation are ignored.
278 \begin{excdesc
}{PicklingError
}
279 This exception is raised when an unpicklable object is passed to
280 \method{Pickler.dump()
}.
285 \seemodule[copyreg
]{copy_reg
}{pickle interface constructor
288 \seemodule{shelve
}{indexed databases of objects; uses
\module{pickle
}}
290 \seemodule{copy
}{shallow and deep object copying
}
292 \seemodule{marshal
}{high-performance serialization of built-in types
}
296 \subsection{Example
\label{pickle-example
}}
298 Here's a simple example of how to modify pickling behavior for a
299 class. The
\class{TextReader
} class opens a text file, and returns
300 the line number and line contents each time its
\method{readline()
}
301 method is called. If a
\class{TextReader
} instance is pickled, all
302 attributes
\emph{except
} the file object member are saved. When the
303 instance is unpickled, the file is reopened, and reading resumes from
304 the last location. The
\method{__setstate__()
} and
305 \method{__getstate__()
} methods are used to implement this behavior.
308 # illustrate __setstate__ and __getstate__ methods
312 "Print and number lines in a text file."
313 def __init__(self,file):
315 self.fh = open(file,'r')
319 self.lineno = self.lineno +
1
320 line = self.fh.readline()
323 return "
%d: %s" % (self.lineno,line[:-1])
325 # return data representation for pickled object
326 def __getstate__(self):
327 odict = self.__dict__ # get attribute dictionary
328 del odict
['fh'
] # remove filehandle entry
331 # restore object state from data representation generated
333 def __setstate__(self,dict):
334 fh = open(dict
['file'
]) # reopen file
335 count = dict
['lineno'
] # read from file...
336 while count: # until line count is restored
339 dict
['fh'
] = fh # create filehandle entry
340 self.__dict__ = dict # make dict our attribute dictionary
343 A sample usage might be something like this:
346 >>> import TextReader
347 >>> obj = TextReader.TextReader("TextReader.py")
349 '
1: #!/usr/local/bin/python'
350 >>> # (more invocations of obj.readline() here)
352 '
7: class TextReader:'
354 >>> pickle.dump(obj,open('save.p','w'))
356 (start another Python session)
359 >>> reader = pickle.load(open('save.p'))
360 >>> reader.readline()
361 '
8: "Print and number lines in a text file."'
365 \section{\module{cPickle
} ---
366 Alternate implementation of
\module{pickle
}}
368 \declaremodule{builtin
}{cPickle
}
369 \modulesynopsis{Faster version of
\refmodule{pickle
}, but not subclassable.
}
370 \moduleauthor{Jim Fulton
}{jfulton@digicool.com
}
371 \sectionauthor{Fred L. Drake, Jr.
}{fdrake@acm.org
}
374 The
\module{cPickle
} module provides a similar interface and identical
375 functionality as the
\refmodule{pickle
}\refstmodindex{pickle
} module,
376 but can be up to
1000 times faster since it is implemented in C. The
377 only other important difference to note is that
\function{Pickler()
}
378 and
\function{Unpickler()
} are functions and not classes, and so
379 cannot be subclassed. This should not be an issue in most cases.
381 The format of the pickle data is identical to that produced using the
382 \refmodule{pickle
} module, so it is possible to use
\refmodule{pickle
} and
383 \module{cPickle
} interchangeably with existing pickles.
385 (Since the pickle data format is actually a tiny stack-oriented
386 programming language, and there are some freedoms in the encodings of
387 certain objects, it's possible that the two modules produce different
388 pickled data for the same input objects; however they will always be
389 able to read each other's pickles back in.)