1 \section{\module{multifile
} ---
2 Support for reading files which contain distinct parts.
}
3 \declaremodule{standard
}{multifile
}
4 \sectionauthor{Eric S. Raymond
}{esr@snark.thyrsus.com
}
6 \modulesynopsis{Support for reading files which contain distinct
7 parts, such as some MIME data.
}
10 The
\class{MultiFile
} object enables you to treat sections of a text
11 file as file-like input objects, with
\code{''
} being returned by
12 \method{readline()
} when a given delimiter pattern is encountered. The
13 defaults of this class are designed to make it useful for parsing
14 MIME multipart messages, but by subclassing it and overriding methods
15 it can be easily adapted for more general use.
17 \begin{classdesc
}{MultiFile
}{fp
\optional{, seekable
}}
18 Create a multi-file. You must instantiate this class with an input
19 object argument for the
\class{MultiFile
} instance to get lines from,
20 such as as a file object returned by
\function{open()
}.
22 \class{MultiFile
} only ever looks at the input object's
23 \method{readline()
},
\method{seek()
} and
\method{tell()
} methods, and
24 the latter two are only needed if you want random access to the
25 individual MIME parts. To use
\class{MultiFile
} on a non-seekable
26 stream object, set the optional
\var{seekable
} argument to false; this
27 will prevent using the input object's
\method{seek()
} and
28 \method{tell()
} methods.
31 It will be useful to know that in
\class{MultiFile
}'s view of the world, text
32 is composed of three kinds of lines: data, section-dividers, and
33 end-markers. MultiFile is designed to support parsing of
34 messages that may have multiple nested message parts, each with its
35 own pattern for section-divider and end-marker lines.
38 \subsection{MultiFile Objects
\label{MultiFile-objects
}}
40 A
\class{MultiFile
} instance has the following methods:
42 \begin{methoddesc
}{push
}{str
}
43 Push a boundary string. When an appropriately decorated version of
44 this boundary is found as an input line, it will be interpreted as a
45 section-divider or end-marker. All subsequent
46 reads will return the empty string to indicate end-of-file, until a
47 call to
\method{pop()
} removes the boundary a or
\method{next()
} call
50 It is possible to push more than one boundary. Encountering the
51 most-recently-pushed boundary will return EOF; encountering any other
52 boundary will raise an error.
55 \begin{methoddesc
}{readline
}{str
}
56 Read a line. If the line is data (not a section-divider or end-marker
57 or real EOF) return it. If the line matches the most-recently-stacked
58 boundary, return
\code{''
} and set
\code{self.last
} to
1 or
0 according as
59 the match is or is not an end-marker. If the line matches any other
60 stacked boundary, raise an error. On encountering end-of-file on the
61 underlying stream object, the method raises
\exception{Error
} unless
62 all boundaries have been popped.
65 \begin{methoddesc
}{readlines
}{str
}
66 Return all lines remaining in this part as a list of strings.
69 \begin{methoddesc
}{read
}{}
70 Read all lines, up to the next section. Return them as a single
71 (multiline) string. Note that this doesn't take a size argument!
74 \begin{methoddesc
}{next
}{}
75 Skip lines to the next section (that is, read lines until a
76 section-divider or end-marker has been consumed). Return true if
77 there is such a section, false if an end-marker is seen. Re-enable
78 the most-recently-pushed boundary.
81 \begin{methoddesc
}{pop
}{}
82 Pop a section boundary. This boundary will no longer be interpreted
86 \begin{methoddesc
}{seek
}{pos
\optional{, whence
}}
87 Seek. Seek indices are relative to the start of the current section.
88 The
\var{pos
} and
\var{whence
} arguments are interpreted as for a file
92 \begin{methoddesc
}{tell
}{}
93 Return the file position relative to the start of the current section.
96 \begin{methoddesc
}{is_data
}{str
}
97 Return true if
\var{str
} is data and false if it might be a section
98 boundary. As written, it tests for a prefix other than
\code{'--'
} at
99 start of line (which all MIME boundaries have) but it is declared so
100 it can be overridden in derived classes.
102 Note that this test is used intended as a fast guard for the real
103 boundary tests; if it always returns false it will merely slow
104 processing, not cause it to fail.
107 \begin{methoddesc
}{section_divider
}{str
}
108 Turn a boundary into a section-divider line. By default, this
109 method prepends
\code{'--'
} (which MIME section boundaries have) but
110 it is declared so it can be overridden in derived classes. This
111 method need not append LF or CR-LF, as comparison with the result
112 ignores trailing whitespace.
115 \begin{methoddesc
}{end_marker
}{str
}
116 Turn a boundary string into an end-marker line. By default, this
117 method prepends
\code{'--'
} and appends
\code{'--'
} (like a
118 MIME-multipart end-of-message marker) but it is declared so it can be
119 be overridden in derived classes. This method need not append LF or
120 CR-LF, as comparison with the result ignores trailing whitespace.
123 Finally,
\class{MultiFile
} instances have two public instance variables:
125 \begin{memberdesc
}{level
}
126 Nesting depth of the current part.
129 \begin{memberdesc
}{last
}
130 True if the last end-of-file was for an end-of-message marker.
134 \subsection{\class{MultiFile
} Example
\label{multifile-example
}}
136 % This is almost unreadable; should be re-written when someone gets time.
139 fp = MultiFile(sys.stdin,
0)
140 fp.push(outer_boundary)
141 message1 = fp.readlines()
142 # We should now be either at real EOF or stopped on a message
143 # boundary. Re-enable the outer boundary.
145 # Read another message with the same delimiter
146 message2 = fp.readlines()
147 # Re-enable that delimiter again
149 # Now look for a message subpart with a different boundary
150 fp.push(inner_boundary)
151 sub_header = fp.readlines()
152 # If no exception has been thrown, we're looking at the start of
153 # the message subpart. Reset and grab the subpart
155 sub_body = fp.readlines()
156 # Got it. Now pop the inner boundary to re-enable the outer one.
158 # Read to next outer boundary
159 message3 = fp.readlines()