Added 'description' class attribute to every command class (to help the
[python/dscho.git] / Doc / lib / librfc822.tex
blob6fecceddd70f5acc52baecc2efa7b73fc5aebc02
1 \section{\module{rfc822} ---
2 Parse RFC 822 mail headers}
4 \declaremodule{standard}{rfc822}
5 \modulesynopsis{Parse \rfc{822} style mail headers.}
7 This module defines a class, \class{Message}, which represents a
8 collection of ``email headers'' as defined by the Internet standard
9 \rfc{822}. It is used in various contexts, usually to read such
10 headers from a file. This module also defines a helper class
11 \class{AddressList} for parsing \rfc{822} addresses.
13 Note that there's a separate module to read \UNIX{}, MH, and MMDF
14 style mailbox files: \refmodule{mailbox}\refstmodindex{mailbox}.
16 \begin{classdesc}{Message}{file\optional{, seekable}}
17 A \class{Message} instance is instantiated with an input object as
18 parameter. Message relies only on the input object having a
19 \method{readline()} method; in particular, ordinary file objects
20 qualify. Instantiation reads headers from the input object up to a
21 delimiter line (normally a blank line) and stores them in the
22 instance.
24 This class can work with any input object that supports a
25 \method{readline()} method. If the input object has seek and tell
26 capability, the \method{rewindbody()} method will work; also, illegal
27 lines will be pushed back onto the input stream. If the input object
28 lacks seek but has an \method{unread()} method that can push back a
29 line of input, \class{Message} will use that to push back illegal
30 lines. Thus this class can be used to parse messages coming from a
31 buffered stream.
33 The optional \var{seekable} argument is provided as a workaround for
34 certain stdio libraries in which \cfunction{tell()} discards buffered
35 data before discovering that the \cfunction{lseek()} system call
36 doesn't work. For maximum portability, you should set the seekable
37 argument to zero to prevent that initial \method{tell()} when passing
38 in an unseekable object such as a a file object created from a socket
39 object.
41 Input lines as read from the file may either be terminated by CR-LF or
42 by a single linefeed; a terminating CR-LF is replaced by a single
43 linefeed before the line is stored.
45 All header matching is done independent of upper or lower case;
46 e.g.\ \code{\var{m}['From']}, \code{\var{m}['from']} and
47 \code{\var{m}['FROM']} all yield the same result.
48 \end{classdesc}
50 \begin{classdesc}{AddressList}{field}
51 You may instantiate the \class{AddressList} helper class using a single
52 string parameter, a comma-separated list of \rfc{822} addresses to be
53 parsed. (The parameter \code{None} yields an empty list.)
54 \end{classdesc}
56 \begin{funcdesc}{parsedate}{date}
57 Attempts to parse a date according to the rules in \rfc{822}.
58 however, some mailers don't follow that format as specified, so
59 \function{parsedate()} tries to guess correctly in such cases.
60 \var{date} is a string containing an \rfc{822} date, such as
61 \code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
62 the date, \function{parsedate()} returns a 9-tuple that can be passed
63 directly to \function{time.mktime()}; otherwise \code{None} will be
64 returned.
65 \end{funcdesc}
67 \begin{funcdesc}{parsedate_tz}{date}
68 Performs the same function as \function{parsedate()}, but returns
69 either \code{None} or a 10-tuple; the first 9 elements make up a tuple
70 that can be passed directly to \function{time.mktime()}, and the tenth
71 is the offset of the date's timezone from UTC (which is the official
72 term for Greenwich Mean Time). (Note that the sign of the timezone
73 offset is the opposite of the sign of the \code{time.timezone}
74 variable for the same timezone; the latter variable follows the
75 \POSIX{} standard while this module follows \rfc{822}.) If the input
76 string has no timezone, the last element of the tuple returned is
77 \code{None}.
78 \end{funcdesc}
80 \begin{funcdesc}{mktime_tz}{tuple}
81 Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
82 timestamp. It the timezone item in the tuple is \code{None}, assume
83 local time. Minor deficiency: this first interprets the first 8
84 elements as a local time and then compensates for the timezone
85 difference; this may yield a slight error around daylight savings time
86 switch dates. Not enough to worry about for common use.
87 \end{funcdesc}
90 \subsection{Message Objects \label{message-objects}}
92 A \class{Message} instance has the following methods:
94 \begin{methoddesc}{rewindbody}{}
95 Seek to the start of the message body. This only works if the file
96 object is seekable.
97 \end{methoddesc}
99 \begin{methoddesc}{isheader}{line}
100 Returns a line's canonicalized fieldname (the dictionary key that will
101 be used to index it) if the line is a legal \rfc{822} header; otherwise
102 returns None (implying that parsing should stop here and the line be
103 pushed back on the input stream). It is sometimes useful to override
104 this method in a subclass.
105 \end{methoddesc}
107 \begin{methoddesc}{islast}{line}
108 Return true if the given line is a delimiter on which Message should
109 stop. The delimiter line is consumed, and the file object's read
110 location positioned immediately after it. By default this method just
111 checks that the line is blank, but you can override it in a subclass.
112 \end{methoddesc}
114 \begin{methoddesc}{iscomment}{line}
115 Return true if the given line should be ignored entirely, just skipped.
116 By default this is a stub that always returns false, but you can
117 override it in a subclass.
118 \end{methoddesc}
120 \begin{methoddesc}{getallmatchingheaders}{name}
121 Return a list of lines consisting of all headers matching
122 \var{name}, if any. Each physical line, whether it is a continuation
123 line or not, is a separate list item. Return the empty list if no
124 header matches \var{name}.
125 \end{methoddesc}
127 \begin{methoddesc}{getfirstmatchingheader}{name}
128 Return a list of lines comprising the first header matching
129 \var{name}, and its continuation line(s), if any. Return
130 \code{None} if there is no header matching \var{name}.
131 \end{methoddesc}
133 \begin{methoddesc}{getrawheader}{name}
134 Return a single string consisting of the text after the colon in the
135 first header matching \var{name}. This includes leading whitespace,
136 the trailing linefeed, and internal linefeeds and whitespace if there
137 any continuation line(s) were present. Return \code{None} if there is
138 no header matching \var{name}.
139 \end{methoddesc}
141 \begin{methoddesc}{getheader}{name\optional{, default}}
142 Like \code{getrawheader(\var{name})}, but strip leading and trailing
143 whitespace. Internal whitespace is not stripped. The optional
144 \var{default} argument can be used to specify a different default to
145 be returned when there is no header matching \var{name}.
146 \end{methoddesc}
148 \begin{methoddesc}{get}{name\optional{, default}}
149 An alias for \method{getheader()}, to make the interface more compatible
150 with regular dictionaries.
151 \end{methoddesc}
153 \begin{methoddesc}{getaddr}{name}
154 Return a pair \code{(\var{full name}, \var{email address})} parsed
155 from the string returned by \code{getheader(\var{name})}. If no
156 header matching \var{name} exists, return \code{(None, None)};
157 otherwise both the full name and the address are (possibly empty)
158 strings.
160 Example: If \var{m}'s first \code{From} header contains the string
161 \code{'jack@cwi.nl (Jack Jansen)'}, then
162 \code{m.getaddr('From')} will yield the pair
163 \code{('Jack Jansen', 'jack@cwi.nl')}.
164 If the header contained
165 \code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
166 exact same result.
167 \end{methoddesc}
169 \begin{methoddesc}{getaddrlist}{name}
170 This is similar to \code{getaddr(\var{list})}, but parses a header
171 containing a list of email addresses (e.g.\ a \code{To} header) and
172 returns a list of \code{(\var{full name}, \var{email address})} pairs
173 (even if there was only one address in the header). If there is no
174 header matching \var{name}, return an empty list.
176 If multiple headers exist that match the named header (e.g. if there
177 are several \code{Cc} headers), all are parsed for addresses. Any
178 continuation lines the named headers contain are also parsed.
179 \end{methoddesc}
181 \begin{methoddesc}{getdate}{name}
182 Retrieve a header using \method{getheader()} and parse it into a 9-tuple
183 compatible with \function{time.mktime()}. If there is no header matching
184 \var{name}, or it is unparsable, return \code{None}.
186 Date parsing appears to be a black art, and not all mailers adhere to
187 the standard. While it has been tested and found correct on a large
188 collection of email from many sources, it is still possible that this
189 function may occasionally yield an incorrect result.
190 \end{methoddesc}
192 \begin{methoddesc}{getdate_tz}{name}
193 Retrieve a header using \method{getheader()} and parse it into a
194 10-tuple; the first 9 elements will make a tuple compatible with
195 \function{time.mktime()}, and the 10th is a number giving the offset
196 of the date's timezone from UTC. Similarly to \method{getdate()}, if
197 there is no header matching \var{name}, or it is unparsable, return
198 \code{None}.
199 \end{methoddesc}
201 \class{Message} instances also support a read-only mapping interface.
202 In particular: \code{\var{m}[name]} is like
203 \code{\var{m}.getheader(name)} but raises \exception{KeyError} if
204 there is no matching header; and \code{len(\var{m})},
205 \code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
206 \code{\var{m}.values()} and \code{\var{m}.items()} act as expected
207 (and consistently).
209 Finally, \class{Message} instances have two public instance variables:
211 \begin{memberdesc}{headers}
212 A list containing the entire set of header lines, in the order in
213 which they were read (except that setitem calls may disturb this
214 order). Each line contains a trailing newline. The
215 blank line terminating the headers is not contained in the list.
216 \end{memberdesc}
218 \begin{memberdesc}{fp}
219 The file or file-like object passed at instantiation time. This can
220 be used to read the message content.
221 \end{memberdesc}
224 \subsection{AddressList Objects \label{addresslist-objects}}
226 An \class{AddressList} instance has the following methods:
228 \begin{methoddesc}{__len__}{name}
229 Return the number of addresses in the address list.
230 \end{methoddesc}
232 \begin{methoddesc}{__str__}{name}
233 Return a canonicalized string representation of the address list.
234 Addresses are rendered in "name" <host@domain> form, comma-separated.
235 \end{methoddesc}
237 \begin{methoddesc}{__add__}{name}
238 Return an \class{AddressList} instance that contains all addresses in
239 both \class{AddressList} operands, with duplicates removed (set union).
240 \end{methoddesc}
242 \begin{methoddesc}{__sub__}{name}
243 Return an \class{AddressList} instance that contains every address in the
244 left-hand \class{AddressList} operand that is not present in the right-hand
245 address operand (set difference).
246 \end{methoddesc}
249 Finally, \class{AddressList} instances have one public instance variable:
251 \begin{memberdesc}{addresslist}
252 A list of tuple string pairs, one per address. In each member, the
253 first is the canonicalized name part of the address, the second is the
254 route-address (@-separated host-domain pair).
255 \end{memberdesc}