This commit was manufactured by cvs2svn to create tag 'r221c2'.
[python/dscho.git] / Doc / lib / libmultifile.tex
blob2d5bcdc4716e11d4f65c4b719a712052713affd9
1 \section{\module{multifile} ---
2 Support for files containing distinct parts}
4 \declaremodule{standard}{multifile}
5 \modulesynopsis{Support for reading files which contain distinct
6 parts, such as some MIME data.}
7 \sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
10 The \class{MultiFile} object enables you to treat sections of a text
11 file as file-like input objects, with \code{''} being returned by
12 \method{readline()} when a given delimiter pattern is encountered. The
13 defaults of this class are designed to make it useful for parsing
14 MIME multipart messages, but by subclassing it and overriding methods
15 it can be easily adapted for more general use.
17 \begin{classdesc}{MultiFile}{fp\optional{, seekable}}
18 Create a multi-file. You must instantiate this class with an input
19 object argument for the \class{MultiFile} instance to get lines from,
20 such as as a file object returned by \function{open()}.
22 \class{MultiFile} only ever looks at the input object's
23 \method{readline()}, \method{seek()} and \method{tell()} methods, and
24 the latter two are only needed if you want random access to the
25 individual MIME parts. To use \class{MultiFile} on a non-seekable
26 stream object, set the optional \var{seekable} argument to false; this
27 will prevent using the input object's \method{seek()} and
28 \method{tell()} methods.
29 \end{classdesc}
31 It will be useful to know that in \class{MultiFile}'s view of the world, text
32 is composed of three kinds of lines: data, section-dividers, and
33 end-markers. MultiFile is designed to support parsing of
34 messages that may have multiple nested message parts, each with its
35 own pattern for section-divider and end-marker lines.
38 \subsection{MultiFile Objects \label{MultiFile-objects}}
40 A \class{MultiFile} instance has the following methods:
42 \begin{methoddesc}{readline}{str}
43 Read a line. If the line is data (not a section-divider or end-marker
44 or real EOF) return it. If the line matches the most-recently-stacked
45 boundary, return \code{''} and set \code{self.last} to 1 or 0 according as
46 the match is or is not an end-marker. If the line matches any other
47 stacked boundary, raise an error. On encountering end-of-file on the
48 underlying stream object, the method raises \exception{Error} unless
49 all boundaries have been popped.
50 \end{methoddesc}
52 \begin{methoddesc}{readlines}{str}
53 Return all lines remaining in this part as a list of strings.
54 \end{methoddesc}
56 \begin{methoddesc}{read}{}
57 Read all lines, up to the next section. Return them as a single
58 (multiline) string. Note that this doesn't take a size argument!
59 \end{methoddesc}
61 \begin{methoddesc}{seek}{pos\optional{, whence}}
62 Seek. Seek indices are relative to the start of the current section.
63 The \var{pos} and \var{whence} arguments are interpreted as for a file
64 seek.
65 \end{methoddesc}
67 \begin{methoddesc}{tell}{}
68 Return the file position relative to the start of the current section.
69 \end{methoddesc}
71 \begin{methoddesc}{next}{}
72 Skip lines to the next section (that is, read lines until a
73 section-divider or end-marker has been consumed). Return true if
74 there is such a section, false if an end-marker is seen. Re-enable
75 the most-recently-pushed boundary.
76 \end{methoddesc}
78 \begin{methoddesc}{is_data}{str}
79 Return true if \var{str} is data and false if it might be a section
80 boundary. As written, it tests for a prefix other than \code{'-}\code{-'} at
81 start of line (which all MIME boundaries have) but it is declared so
82 it can be overridden in derived classes.
84 Note that this test is used intended as a fast guard for the real
85 boundary tests; if it always returns false it will merely slow
86 processing, not cause it to fail.
87 \end{methoddesc}
89 \begin{methoddesc}{push}{str}
90 Push a boundary string. When an appropriately decorated version of
91 this boundary is found as an input line, it will be interpreted as a
92 section-divider or end-marker. All subsequent
93 reads will return the empty string to indicate end-of-file, until a
94 call to \method{pop()} removes the boundary a or \method{next()} call
95 reenables it.
97 It is possible to push more than one boundary. Encountering the
98 most-recently-pushed boundary will return EOF; encountering any other
99 boundary will raise an error.
100 \end{methoddesc}
102 \begin{methoddesc}{pop}{}
103 Pop a section boundary. This boundary will no longer be interpreted
104 as EOF.
105 \end{methoddesc}
107 \begin{methoddesc}{section_divider}{str}
108 Turn a boundary into a section-divider line. By default, this
109 method prepends \code{'-}\code{-'} (which MIME section boundaries have) but
110 it is declared so it can be overridden in derived classes. This
111 method need not append LF or CR-LF, as comparison with the result
112 ignores trailing whitespace.
113 \end{methoddesc}
115 \begin{methoddesc}{end_marker}{str}
116 Turn a boundary string into an end-marker line. By default, this
117 method prepends \code{'-}\code{-'} and appends \code{'-}\code{-'} (like a
118 MIME-multipart end-of-message marker) but it is declared so it can be
119 be overridden in derived classes. This method need not append LF or
120 CR-LF, as comparison with the result ignores trailing whitespace.
121 \end{methoddesc}
123 Finally, \class{MultiFile} instances have two public instance variables:
125 \begin{memberdesc}{level}
126 Nesting depth of the current part.
127 \end{memberdesc}
129 \begin{memberdesc}{last}
130 True if the last end-of-file was for an end-of-message marker.
131 \end{memberdesc}
134 \subsection{\class{MultiFile} Example \label{multifile-example}}
135 \sectionauthor{Skip Montanaro}{skip@mojam.com}
137 \begin{verbatim}
138 import mimetools
139 import multifile
140 import StringIO
142 def extract_mime_part_matching(stream, mimetype):
143 """Return the first element in a multipart MIME message on stream
144 matching mimetype."""
146 msg = mimetools.Message(stream)
147 msgtype = msg.gettype()
148 params = msg.getplist()
150 data = StringIO.StringIO()
151 if msgtype[:10] == "multipart/":
153 file = multifile.MultiFile(stream)
154 file.push(msg.getparam("boundary"))
155 while file.next():
156 submsg = mimetools.Message(file)
157 try:
158 data = StringIO.StringIO()
159 mimetools.decode(file, data, submsg.getencoding())
160 except ValueError:
161 continue
162 if submsg.gettype() == mimetype:
163 break
164 file.pop()
165 return data.getvalue()
166 \end{verbatim}