1 \section{\module{xml.sax.handler
} ---
2 Base classes for SAX handlers
}
4 \declaremodule{standard
}{xml.sax.handler
}
5 \modulesynopsis{Base classes for SAX event handlers.
}
6 \sectionauthor{Martin v. L\"owis
}{loewis@informatik.hu-berlin.de
}
7 \moduleauthor{Lars Marius Garshol
}{larsga@garshol.priv.no
}
12 The SAX API defines four kinds of handlers: content handlers, DTD
13 handlers, error handlers, and entity resolvers. Applications normally
14 only need to implement those interfaces whose events they are
15 interested in; they can implement the interfaces in a single object or
16 in multiple objects. Handler implementations should inherit from the
17 base classes provided in the module
\module{xml.sax
}, so that all
18 methods get default implementations.
20 \begin{classdesc*
}{ContentHandler
}
21 This is the main callback interface in SAX, and the one most
22 important to applications. The order of events in this interface
23 mirrors the order of the information in the
document.
26 \begin{classdesc*
}{DTDHandler
}
29 This interface specifies only those DTD events required for basic
30 parsing (unparsed entities and attributes).
33 \begin{classdesc*
}{EntityResolver
}
34 Basic interface for resolving entities. If you create an object
35 implementing this interface, then register the object with your
36 Parser, the parser will call the method in your object to resolve all
40 \begin{classdesc*
}{ErrorHandler
}
41 Interface used by the parser to present error and warning messages
42 to the application. The methods of this object control whether errors
43 are immediately converted to exceptions or are handled in some other
47 In addition to these classes,
\module{xml.sax.handler
} provides
48 symbolic constants for the feature and property names.
50 \begin{datadesc
}{feature_namespaces
}
51 Value:
\code{"http://xml.org/sax/features/namespaces"
}\\
52 true: Perform Namespace processing.\\
53 false: Optionally do not perform Namespace processing
54 (implies namespace-prefixes; default).\\
55 access: (parsing) read-only; (not parsing) read/write
58 \begin{datadesc
}{feature_namespace_prefixes
}
59 Value:
\code{"http://xml.org/sax/features/namespace-prefixes"
}\\
60 true: Report the original prefixed names and attributes used for Namespace
62 false: Do not
report attributes used for Namespace declarations, and
63 optionally do not
report original prefixed names (default).\\
64 access: (parsing) read-only; (not parsing) read/write
67 \begin{datadesc
}{feature_string_interning
}
68 Value:
\code{"http://xml.org/sax/features/string-interning"
}
69 true: All element names, prefixes, attribute names, Namespace URIs, and
70 local names are interned using the built-in intern function.\\
71 false: Names are not necessarily interned, although they may be (default).\\
72 access: (parsing) read-only; (not parsing) read/write
75 \begin{datadesc
}{feature_validation
}
76 Value:
\code{"http://xml.org/sax/features/validation"
}\\
77 true: Report all validation errors (implies external-general-entities and
78 external-parameter-entities).\\
79 false: Do not
report validation errors.\\
80 access: (parsing) read-only; (not parsing) read/write
83 \begin{datadesc
}{feature_external_ges
}
84 Value:
\code{"http://xml.org/sax/features/external-general-entities"
}\\
85 true: Include all external general (text) entities.\\
86 false: Do not include external general entities.\\
87 access: (parsing) read-only; (not parsing) read/write
90 \begin{datadesc
}{feature_external_pes
}
91 Value:
\code{"http://xml.org/sax/features/external-parameter-entities"
}\\
92 true: Include all external parameter entities, including the external
94 false: Do not include any external parameter entities, even the external
96 access: (parsing) read-only; (not parsing) read/write
99 \begin{datadesc
}{all_features
}
100 List of all features.
103 \begin{datadesc
}{property_lexical_handler
}
104 Value:
\code{"http://xml.org/sax/properties/lexical-handler"
}\\
105 data type: xml.sax.sax2lib.LexicalHandler (not supported in Python
2)\\
106 description: An optional extension handler for lexical events like comments.\\
110 \begin{datadesc
}{property_declaration_handler
}
111 Value:
\code{"http://xml.org/sax/properties/declaration-handler"
}\\
112 data type: xml.sax.sax2lib.DeclHandler (not supported in Python
2)\\
113 description: An optional extension handler for DTD-related events other
114 than notations and unparsed entities.\\
118 \begin{datadesc
}{property_dom_node
}
119 Value:
\code{"http://xml.org/sax/properties/dom-node"
}\\
120 data type: org.w3c.dom.Node (not supported in Python
2) \\
121 description: When parsing, the current DOM node being visited if this is
122 a DOM iterator; when not parsing, the root DOM node for
124 access: (parsing) read-only; (not parsing) read/write
127 \begin{datadesc
}{property_xml_string
}
128 Value:
\code{"http://xml.org/sax/properties/xml-string"
}\\
130 description: The literal string of characters that was the source for
135 \begin{datadesc
}{all_properties
}
136 List of all known property names.
140 \subsection{ContentHandler Objects
\label{content-handler-objects
}}
142 Users are expected to subclass
\class{ContentHandler
} to support their
143 application. The following methods are called by the parser on the
144 appropriate events in the input
document:
146 \begin{methoddesc
}[ContentHandler
]{setDocumentLocator
}{locator
}
147 Called by the parser to give the application a locator for locating
148 the origin of
document events.
150 SAX parsers are strongly encouraged (though not absolutely required)
151 to supply a locator: if it does so, it must supply the locator to
152 the application by invoking this method before invoking any of the
153 other methods in the DocumentHandler interface.
155 The locator allows the application to determine the end position of
156 any
document-related event, even if the parser is not reporting an
157 error. Typically, the application will use this information for
158 reporting its own errors (such as character content that does not
159 match an application's business rules). The information returned by
160 the locator is probably not sufficient for use with a search engine.
162 Note that the locator will return correct information only during
163 the invocation of the events in this interface. The application
164 should not attempt to use it at any other time.
167 \begin{methoddesc
}[ContentHandler
]{startDocument
}{}
168 Receive notification of the beginning of a
document.
170 The SAX parser will invoke this method only once, before any other
171 methods in this interface or in DTDHandler (except for
172 \method{setDocumentLocator()
}).
175 \begin{methoddesc
}[ContentHandler
]{endDocument
}{}
176 Receive notification of the end of a
document.
178 The SAX parser will invoke this method only once, and it will be the
179 last method invoked during the parse. The parser shall not invoke
180 this method until it has either abandoned parsing (because of an
181 unrecoverable error) or reached the end of input.
184 \begin{methoddesc
}[ContentHandler
]{startPrefixMapping
}{prefix, uri
}
185 Begin the scope of a prefix-URI Namespace mapping.
187 The information from this event is not necessary for normal
188 Namespace processing: the SAX XML reader will automatically replace
189 prefixes for element and attribute names when the
190 \code{feature_namespaces
} feature is enabled (the default).
192 %% XXX This is not really the default, is it? MvL
194 There are cases, however, when applications need to use prefixes in
195 character data or in attribute values, where they cannot safely be
196 expanded automatically; the
\method{startPrefixMapping()
} and
197 \method{endPrefixMapping()
} events supply the information to the
198 application to expand prefixes in those contexts itself, if
201 Note that
\method{startPrefixMapping()
} and
202 \method{endPrefixMapping()
} events are not guaranteed to be properly
203 nested relative to each-other: all
\method{startPrefixMapping()
}
204 events will occur before the corresponding
\method{startElement()
}
205 event, and all
\method{endPrefixMapping()
} events will occur after
206 the corresponding
\method{endElement()
} event, but their order is
210 \begin{methoddesc
}[ContentHandler
]{endPrefixMapping
}{prefix
}
211 End the scope of a prefix-URI mapping.
213 See
\method{startPrefixMapping()
} for details. This event will
214 always occur after the corresponding
\method{endElement()
} event,
215 but the order of
\method{endPrefixMapping()
} events is not otherwise
219 \begin{methoddesc
}[ContentHandler
]{startElement
}{name, attrs
}
220 Signals the start of an element in non-namespace mode.
222 The
\var{name
} parameter contains the raw XML
1.0 name of the
223 element type as a string and the
\var{attrs
} parameter holds an
224 object of the
\ulink{\class{Attributes
}
225 interface
}{attributes-objects.html
} containing the attributes of the
226 element. The object passed as
\var{attrs
} may be re-used by the
227 parser; holding on to a reference to it is not a reliable way to
228 keep a copy of the attributes. To keep a copy of the attributes,
229 use the
\method{copy()
} method of the
\var{attrs
} object.
232 \begin{methoddesc
}[ContentHandler
]{endElement
}{name
}
233 Signals the end of an element in non-namespace mode.
235 The
\var{name
} parameter contains the name of the element type, just
236 as with the
\method{startElement()
} event.
239 \begin{methoddesc
}[ContentHandler
]{startElementNS
}{name, qname, attrs
}
240 Signals the start of an element in namespace mode.
242 The
\var{name
} parameter contains the name of the element type as a
243 \code{(
\var{uri
},
\var{localname
})
} tuple, the
\var{qname
} parameter
244 contains the raw XML
1.0 name used in the source
document, and the
245 \var{attrs
} parameter holds an instance of the
246 \ulink{\class{AttributesNS
} interface
}{attributes-ns-objects.html
}
247 containing the attributes of the element. If no namespace is
248 associated with the element, the
\var{uri
} component of
\var{name
}
249 will be
\code{None
}. The object passed as
\var{attrs
} may be
250 re-used by the parser; holding on to a reference to it is not a
251 reliable way to keep a copy of the attributes. To keep a copy of
252 the attributes, use the
\method{copy()
} method of the
\var{attrs
}
255 Parsers may set the
\var{qname
} parameter to
\code{None
}, unless the
256 \code{feature_namespace_prefixes
} feature is activated.
259 \begin{methoddesc
}[ContentHandler
]{endElementNS
}{name, qname
}
260 Signals the end of an element in namespace mode.
262 The
\var{name
} parameter contains the name of the element type, just
263 as with the
\method{startElementNS()
} method, likewise the
264 \var{qname
} parameter.
267 \begin{methoddesc
}[ContentHandler
]{characters
}{content
}
268 Receive notification of character data.
270 The Parser will call this method to
report each chunk of character
271 data. SAX parsers may return all contiguous character data in a
272 single chunk, or they may split it into several chunks; however, all
273 of the characters in any single event must come from the same
274 external entity so that the Locator provides useful information.
276 \var{content
} may be a Unicode string or a byte string; the
277 \code{expat
} reader module produces always Unicode strings.
279 \note{The earlier SAX
1 interface provided by the Python
280 XML Special Interest Group used a more Java-like interface for this
281 method. Since most parsers used from Python did not take advantage
282 of the older interface, the simpler signature was chosen to replace
283 it. To convert old code to the new interface, use
\var{content
}
284 instead of slicing content with the old
\var{offset
} and
285 \var{length
} parameters.
}
288 \begin{methoddesc
}[ContentHandler
]{ignorableWhitespace
}{}
289 Receive notification of ignorable whitespace in element content.
291 Validating Parsers must use this method to
report each chunk
292 of ignorable whitespace (see the W3C XML
1.0 recommendation,
293 section
2.10): non-validating parsers may also use this method
294 if they are capable of parsing and using content models.
296 SAX parsers may return all contiguous whitespace in a single
297 chunk, or they may split it into several chunks; however, all
298 of the characters in any single event must come from the same
299 external entity, so that the Locator provides useful
303 \begin{methoddesc
}[ContentHandler
]{processingInstruction
}{target, data
}
304 Receive notification of a processing instruction.
306 The Parser will invoke this method once for each processing
307 instruction found: note that processing instructions may occur
308 before or after the main
document element.
310 A SAX parser should never
report an XML declaration (XML
1.0,
311 section
2.8) or a text declaration (XML
1.0, section
4.3.1) using
315 \begin{methoddesc
}[ContentHandler
]{skippedEntity
}{name
}
316 Receive notification of a skipped entity.
318 The Parser will invoke this method once for each entity
319 skipped. Non-validating processors may skip entities if they have
320 not seen the declarations (because, for example, the entity was
321 declared in an external DTD subset). All processors may skip
322 external entities, depending on the values of the
323 \code{feature_external_ges
} and the
324 \code{feature_external_pes
} properties.
328 \subsection{DTDHandler Objects
\label{dtd-handler-objects
}}
330 \class{DTDHandler
} instances provide the following methods:
332 \begin{methoddesc
}[DTDHandler
]{notationDecl
}{name, publicId, systemId
}
333 Handle a notation declaration event.
336 \begin{methoddesc
}[DTDHandler
]{unparsedEntityDecl
}{name, publicId,
338 Handle an unparsed entity declaration event.
342 \subsection{EntityResolver Objects
\label{entity-resolver-objects
}}
344 \begin{methoddesc
}[EntityResolver
]{resolveEntity
}{publicId, systemId
}
345 Resolve the system identifier of an entity and return either the
346 system identifier to read from as a string, or an InputSource to
347 read from. The default implementation returns
\var{systemId
}.
351 \subsection{ErrorHandler Objects
\label{sax-error-handler
}}
353 Objects with this interface are used to receive error and warning
354 information from the
\class{XMLReader
}. If you create an object that
355 implements this interface, then register the object with your
356 \class{XMLReader
}, the parser will call the methods in your object to
357 report all warnings and errors. There are three levels of errors
358 available: warnings, (possibly) recoverable errors, and unrecoverable
359 errors. All methods take a
\exception{SAXParseException
} as the only
360 parameter. Errors and warnings may be converted to an exception by
361 raising the passed-in exception object.
363 \begin{methoddesc
}[ErrorHandler
]{error
}{exception
}
364 Called when the parser encounters a recoverable error. If this method
365 does not raise an exception, parsing may continue, but further
document
366 information should not be expected by the application. Allowing the
367 parser to continue may allow additional errors to be discovered in the
371 \begin{methoddesc
}[ErrorHandler
]{fatalError
}{exception
}
372 Called when the parser encounters an error it cannot recover from;
373 parsing is expected to terminate when this method returns.
376 \begin{methoddesc
}[ErrorHandler
]{warning
}{exception
}
377 Called when the parser presents minor warning information to the
378 application. Parsing is expected to continue when this method returns,
379 and
document information will continue to be passed to the application.
380 Raising an exception in this method will cause parsing to end.