1 \section{\module{rfc822
} ---
2 Parse RFC
822 mail headers
}
4 \declaremodule{standard
}{rfc822
}
5 \modulesynopsis{Parse
\rfc{822} style mail headers.
}
7 This module defines a class,
\class{Message
}, which represents a
8 collection of ``email headers'' as defined by the Internet standard
9 \rfc{822}. It is used in various contexts, usually to read such
10 headers from a file. This module also defines a helper class
11 \class{AddressList
} for parsing
\rfc{822} addresses. Please refer to
12 the RFC for information on the specific syntax of
\rfc{822} headers.
14 The
\refmodule{mailbox
}\refstmodindex{mailbox
} module provides classes
15 to read mailboxes produced by various end-user mail programs.
17 \begin{classdesc
}{Message
}{file
\optional{, seekable
}}
18 A
\class{Message
} instance is instantiated with an input object as
19 parameter. Message relies only on the input object having a
20 \method{readline()
} method; in particular, ordinary file objects
21 qualify. Instantiation reads headers from the input object up to a
22 delimiter line (normally a blank line) and stores them in the
23 instance. The message body, following the headers, is not consumed.
25 This class can work with any input object that supports a
26 \method{readline()
} method. If the input object has seek and tell
27 capability, the
\method{rewindbody()
} method will work; also, illegal
28 lines will be pushed back onto the input stream. If the input object
29 lacks seek but has an
\method{unread()
} method that can push back a
30 line of input,
\class{Message
} will use that to push back illegal
31 lines. Thus this class can be used to parse messages coming from a
34 The optional
\var{seekable
} argument is provided as a workaround for
35 certain stdio libraries in which
\cfunction{tell()
} discards buffered
36 data before discovering that the
\cfunction{lseek()
} system call
37 doesn't work. For maximum portability, you should set the seekable
38 argument to zero to prevent that initial
\method{tell()
} when passing
39 in an unseekable object such as a a file object created from a socket
42 Input lines as read from the file may either be terminated by CR-LF or
43 by a single linefeed; a terminating CR-LF is replaced by a single
44 linefeed before the line is stored.
46 All header matching is done independent of upper or lower case;
47 e.g.\
\code{\var{m
}['From'
]},
\code{\var{m
}['from'
]} and
48 \code{\var{m
}['FROM'
]} all yield the same result.
51 \begin{classdesc
}{AddressList
}{field
}
52 You may instantiate the
\class{AddressList
} helper class using a single
53 string parameter, a comma-separated list of
\rfc{822} addresses to be
54 parsed. (The parameter
\code{None
} yields an empty list.)
57 \begin{funcdesc
}{parsedate
}{date
}
58 Attempts to parse a date according to the rules in
\rfc{822}.
59 however, some mailers don't follow that format as specified, so
60 \function{parsedate()
} tries to guess correctly in such cases.
61 \var{date
} is a string containing an
\rfc{822} date, such as
62 \code{'Mon,
20 Nov
1995 19:
12:
08 -
0500'
}. If it succeeds in parsing
63 the date,
\function{parsedate()
} returns a
9-tuple that can be passed
64 directly to
\function{time.mktime()
}; otherwise
\code{None
} will be
65 returned. Note that fields
6,
7, and
8 of the result tuple are not
69 \begin{funcdesc
}{parsedate_tz
}{date
}
70 Performs the same function as
\function{parsedate()
}, but returns
71 either
\code{None
} or a
10-tuple; the first
9 elements make up a tuple
72 that can be passed directly to
\function{time.mktime()
}, and the tenth
73 is the offset of the date's timezone from UTC (which is the official
74 term for Greenwich Mean Time). (Note that the sign of the timezone
75 offset is the opposite of the sign of the
\code{time.timezone
}
76 variable for the same timezone; the latter variable follows the
77 \POSIX{} standard while this module follows
\rfc{822}.) If the input
78 string has no timezone, the last element of the tuple returned is
79 \code{None
}. Note that fields
6,
7, and
8 of the result tuple are not
83 \begin{funcdesc
}{mktime_tz
}{tuple
}
84 Turn a
10-tuple as returned by
\function{parsedate_tz()
} into a UTC
85 timestamp. It the timezone item in the tuple is
\code{None
}, assume
86 local time. Minor deficiency: this first interprets the first
8
87 elements as a local time and then compensates for the timezone
88 difference; this may yield a slight error around daylight savings time
89 switch dates. Not enough to worry about for common use.
94 \seemodule{mailbox
}{Classes to read various mailbox formats produced
95 by end-user mail programs.
}
96 \seemodule{mimetools
}{Subclass of rfc.Message that handles MIME encoded
101 \subsection{Message Objects
\label{message-objects
}}
103 A
\class{Message
} instance has the following methods:
105 \begin{methoddesc
}{rewindbody
}{}
106 Seek to the start of the message body. This only works if the file
110 \begin{methoddesc
}{isheader
}{line
}
111 Returns a line's canonicalized fieldname (the dictionary key that will
112 be used to index it) if the line is a legal
\rfc{822} header; otherwise
113 returns None (implying that parsing should stop here and the line be
114 pushed back on the input stream). It is sometimes useful to override
115 this method in a subclass.
118 \begin{methoddesc
}{islast
}{line
}
119 Return true if the given line is a delimiter on which Message should
120 stop. The delimiter line is consumed, and the file object's read
121 location positioned immediately after it. By default this method just
122 checks that the line is blank, but you can override it in a subclass.
125 \begin{methoddesc
}{iscomment
}{line
}
126 Return true if the given line should be ignored entirely, just skipped.
127 By default this is a stub that always returns false, but you can
128 override it in a subclass.
131 \begin{methoddesc
}{getallmatchingheaders
}{name
}
132 Return a list of lines consisting of all headers matching
133 \var{name
}, if any. Each physical line, whether it is a continuation
134 line or not, is a separate list item. Return the empty list if no
135 header matches
\var{name
}.
138 \begin{methoddesc
}{getfirstmatchingheader
}{name
}
139 Return a list of lines comprising the first header matching
140 \var{name
}, and its continuation line(s), if any. Return
141 \code{None
} if there is no header matching
\var{name
}.
144 \begin{methoddesc
}{getrawheader
}{name
}
145 Return a single string consisting of the text after the colon in the
146 first header matching
\var{name
}. This includes leading whitespace,
147 the trailing linefeed, and internal linefeeds and whitespace if there
148 any continuation line(s) were present. Return
\code{None
} if there is
149 no header matching
\var{name
}.
152 \begin{methoddesc
}{getheader
}{name
\optional{, default
}}
153 Like
\code{getrawheader(
\var{name
})
}, but strip leading and trailing
154 whitespace. Internal whitespace is not stripped. The optional
155 \var{default
} argument can be used to specify a different default to
156 be returned when there is no header matching
\var{name
}.
159 \begin{methoddesc
}{get
}{name
\optional{, default
}}
160 An alias for
\method{getheader()
}, to make the interface more compatible
161 with regular dictionaries.
164 \begin{methoddesc
}{getaddr
}{name
}
165 Return a pair
\code{(
\var{full name
},
\var{email address
})
} parsed
166 from the string returned by
\code{getheader(
\var{name
})
}. If no
167 header matching
\var{name
} exists, return
\code{(None, None)
};
168 otherwise both the full name and the address are (possibly empty)
171 Example: If
\var{m
}'s first
\code{From
} header contains the string
172 \code{'jack@cwi.nl (Jack Jansen)'
}, then
173 \code{m.getaddr('From')
} will yield the pair
174 \code{('Jack Jansen', 'jack@cwi.nl')
}.
175 If the header contained
176 \code{'Jack Jansen <jack@cwi.nl>'
} instead, it would yield the
180 \begin{methoddesc
}{getaddrlist
}{name
}
181 This is similar to
\code{getaddr(
\var{list
})
}, but parses a header
182 containing a list of email addresses (e.g.\ a
\code{To
} header) and
183 returns a list of
\code{(
\var{full name
},
\var{email address
})
} pairs
184 (even if there was only one address in the header). If there is no
185 header matching
\var{name
}, return an empty list.
187 If multiple headers exist that match the named header (e.g. if there
188 are several
\code{Cc
} headers), all are parsed for addresses. Any
189 continuation lines the named headers contain are also parsed.
192 \begin{methoddesc
}{getdate
}{name
}
193 Retrieve a header using
\method{getheader()
} and parse it into a
9-tuple
194 compatible with
\function{time.mktime()
}; note that fields
6,
7, and
8
195 are not usable. If there is no header matching
196 \var{name
}, or it is unparsable, return
\code{None
}.
198 Date parsing appears to be a black art, and not all mailers adhere to
199 the standard. While it has been tested and found correct on a large
200 collection of email from many sources, it is still possible that this
201 function may occasionally yield an incorrect result.
204 \begin{methoddesc
}{getdate_tz
}{name
}
205 Retrieve a header using
\method{getheader()
} and parse it into a
206 10-tuple; the first
9 elements will make a tuple compatible with
207 \function{time.mktime()
}, and the
10th is a number giving the offset
208 of the date's timezone from UTC. Note that fields
6,
7, and
8
209 are not usable. Similarly to
\method{getdate()
}, if
210 there is no header matching
\var{name
}, or it is unparsable, return
214 \class{Message
} instances also support a read-only mapping interface.
215 In particular:
\code{\var{m
}[name
]} is like
216 \code{\var{m
}.getheader(name)
} but raises
\exception{KeyError
} if
217 there is no matching header; and
\code{len(
\var{m
})
},
218 \code{\var{m
}.has_key(name)
},
\code{\var{m
}.keys()
},
219 \code{\var{m
}.values()
} and
\code{\var{m
}.items()
} act as expected
222 Finally,
\class{Message
} instances have two public instance variables:
224 \begin{memberdesc
}{headers
}
225 A list containing the entire set of header lines, in the order in
226 which they were read (except that setitem calls may disturb this
227 order). Each line contains a trailing newline. The
228 blank line terminating the headers is not contained in the list.
231 \begin{memberdesc
}{fp
}
232 The file or file-like object passed at instantiation time. This can
233 be used to read the message content.
237 \subsection{AddressList Objects
\label{addresslist-objects
}}
239 An
\class{AddressList
} instance has the following methods:
241 \begin{methoddesc
}{__len__
}{name
}
242 Return the number of addresses in the address list.
245 \begin{methoddesc
}{__str__
}{name
}
246 Return a canonicalized string representation of the address list.
247 Addresses are rendered in "name" <host@domain> form, comma-separated.
250 \begin{methoddesc
}{__add__
}{name
}
251 Return an
\class{AddressList
} instance that contains all addresses in
252 both
\class{AddressList
} operands, with duplicates removed (set union).
255 \begin{methoddesc
}{__sub__
}{name
}
256 Return an
\class{AddressList
} instance that contains every address in the
257 left-hand
\class{AddressList
} operand that is not present in the right-hand
258 address operand (set difference).
262 Finally,
\class{AddressList
} instances have one public instance variable:
264 \begin{memberdesc
}{addresslist
}
265 A list of tuple string pairs, one per address. In each member, the
266 first is the canonicalized name part, the second is the
267 actual route-address (@-separated username-host.domain pair).