1 \section{\module{pyexpat
} ---
2 Fast XML parsing using the Expat C library
}
4 \declaremodule{builtin
}{pyexpat
}
5 \modulesynopsis{An interface to the Expat XML parser.
}
6 \moduleauthor{Paul Prescod
}{paul@prescod.net
}
7 \sectionauthor{A.M. Kuchling
}{amk1@bigfoot.com
}
9 The
\module{pyexpat
} module is a Python interface to the Expat
10 non-validating XML parser.
11 The module provides a single extension type,
\class{xmlparser
}, that
12 represents the current state of an XML parser. After an
13 \class{xmlparser
} object has been created, various attributes of the object
14 can be set to handler functions. When an XML
document is then fed to
15 the parser, the handler functions are called for the character data
16 and markup in the XML
document.
18 The
\module{pyexpat
} module contains two functions:
20 \begin{funcdesc
}{ErrorString
}{errno
}
21 Returns an explanatory string for a given error number
\var{errno
}.
24 \begin{funcdesc
}{ParserCreate
}{\optional{encoding, namespace_separator
}}
25 Creates and returns a new
\class{xmlparser
} object.
26 \var{encoding
}, if specified, must be a string naming the encoding
27 used by the XML data. Expat doesn't support as many encodings as
28 Python does, and its repertoire of encodings can't be extended; it
29 supports UTF-
8, UTF-
16, ISO-
8859-
1 (Latin1), and ASCII.
31 % XXX pyexpat.c should only allow a 1-char string for this parameter
32 Expat can optionally do XML namespace processing for you, enabled by
33 providing a value for
\var{namespace_separator
}. When namespace
34 processing is enabled, element type names and attribute names that
35 belong to a namespace will be expanded. The element name
36 passed to the element handlers
37 \function{StartElementHandler()
} and
\function{EndElementHandler()
}
38 will be the concatenation of the namespace URI, the namespace
39 separator character, and the local part of the name. If the namespace
40 separator is a zero byte (
\code{chr(
0)
})
41 then the namespace URI and the local part will be
42 concatenated without any separator.
44 For example, if
\var{namespace_separator
} is set to
45 \samp{ }, and the following
document is parsed:
49 <root xmlns = "http://default-namespace.org/"
50 xmlns:py = "http://www.python.org/ns/">
56 \function{StartElementHandler()
} will receive the following strings for each element:
59 http://default-namespace.org/ root
60 http://www.python.org/ns/ elem1
66 \class{xmlparser
} objects have the following methods:
68 \begin{methoddesc
}{Parse
}{data
\optional{, isfinal
}}
69 Parses the contents of the string
\var{data
}, calling the appropriate
70 handler functions to process the parsed data.
\var{isfinal
} must be
71 true on the final call to this method.
\var{data
} can be the empty
75 \begin{methoddesc
}{ParseFile
}{file
}
76 Parse XML data reading from the object
\var{file
}.
\var{file
} only
77 needs to provide the
\method{read(
\var{nbytes
})
} method, returning the
78 empty string when there's no more data.
81 \begin{methoddesc
}{SetBase
}{base
}
82 Sets the base to be used for resolving relative URIs in system identifiers in
83 declarations. Resolving relative identifiers is left to the application:
84 this value will be passed through as the base argument to the
85 \function{ExternalEntityRefHandler
},
\function{NotationDeclHandler
},
86 and
\function{UnparsedEntityDeclHandler
} functions.
89 \begin{methoddesc
}{GetBase
}{}
90 Returns a string containing the base set by a previous call to
91 \method{SetBase()
}, or
\code{None
} if
92 \method{SetBase()
} hasn't been called.
95 \class{xmlparser
} objects have the following attributes, containing
96 values relating to the most recent error encountered by an
97 \class{xmlparser
} object. These attributes will only have correct
98 values once a call to
\method{Parse()
} or
\method{ParseFile()
}
99 has raised a
\exception{pyexpat.error
} exception.
101 \begin{datadesc
}{ErrorByteIndex
}
102 Byte index at which an error occurred.
105 \begin{datadesc
}{ErrorCode
}
106 Numeric code specifying the problem. This value can be passed to the
107 \function{ErrorString()
} function, or compared to one of the constants
108 defined in the
\module{pyexpat.errors
} submodule.
111 \begin{datadesc
}{ErrorColumnNumber
}
112 Column number at which an error occurred.
115 \begin{datadesc
}{ErrorLineNumber
}
116 Line number at which an error occurred.
119 Here is the list of handlers that can be set. To set a handler on an
120 \class{xmlparser
} object
\var{o
}, use
121 \code{\var{o
}.
\var{handlername
} =
\var{func
}}.
\var{handlername
} must
122 be taken from the following list, and
\var{func
} must be a callable
123 object accepting the correct number of arguments. The arguments are
124 all strings, unless otherwise stated.
126 \begin{methoddesc
}{StartElementHandler
}{name, attributes
}
127 Called for the start of every element.
\var{name
} is a string
128 containing the element name, and
\var{attributes
} is a dictionary
129 mapping attribute names to their values.
132 \begin{methoddesc
}{EndElementHandler
}{name
}
133 Called for the end of every element.
136 \begin{methoddesc
}{ProcessingInstructionHandler
}{target, data
}
137 Called for every processing instruction.
140 \begin{methoddesc
}{CharacterDataHandler
}{\var{data
}}
141 Called for character data.
144 \begin{methoddesc
}{UnparsedEntityDeclHandler
}{entityName, base, systemId, publicId, notationName
}
145 Called for unparsed (NDATA) entity declarations.
148 \begin{methoddesc
}{NotationDeclHandler
}{notationName, base, systemId, publicId
}
149 Called for notation declarations.
152 \begin{methoddesc
}{StartNamespaceDeclHandler
}{prefix, uri
}
153 Called when an element contains a namespace declaration.
156 \begin{methoddesc
}{EndNamespaceDeclHandler
}{prefix
}
157 Called when the closing tag is reached for an element
158 that contained a namespace declaration.
161 \begin{methoddesc
}{CommentHandler
}{data
}
165 \begin{methoddesc
}{StartCdataSectionHandler
}{}
166 Called at the start of a CDATA section.
169 \begin{methoddesc
}{EndCdataSectionHandler
}{}
170 Called at the end of a CDATA section.
173 \begin{methoddesc
}{DefaultHandler
}{data
}
174 Called for any characters in the XML
document for
175 which no applicable handler has been specified. This means
176 characters that are part of a construct which could be reported, but
177 for which no handler has been supplied.
180 \begin{methoddesc
}{DefaultHandlerExpand
}{data
}
181 This is the same as the
\function{DefaultHandler
},
182 but doesn't inhibit expansion of internal entities.
183 The entity reference will not be passed to the default handler.
186 \begin{methoddesc
}{NotStandaloneHandler
}{}
187 Called if the XML
document hasn't been declared as being a standalone
document.
190 \begin{methoddesc
}{ExternalEntityRefHandler
}{context, base, systemId, publicId
}
191 Called for references to external entities.
195 \subsection{Example
\label{pyexpat-example
}}
197 The following program defines three handlers that just print out their
204 #
3 handler functions
205 def start_element(name, attrs):
206 print 'Start element:', name, attrs
207 def end_element(name):
208 print 'End element:', name
210 print 'Character data:', repr(data)
212 p=pyexpat.ParserCreate()
214 p.StartElementHandler = start_element
215 p.EndElementHandler = end_element
216 p.CharacterDataHandler= char_data
218 p.Parse("""<?xml version="
1.0"?>
219 <parent id="top"><child1 name="paul">Text goes here</child1>
220 <child2 name="fred">More text</child2>
224 The output from this program is:
227 Start element: parent
{'id': 'top'
}
228 Start element: child1
{'name': 'paul'
}
229 Character data: 'Text goes here'
231 Character data: '
\012'
232 Start element: child2
{'name': 'fred'
}
233 Character data: 'More text'
235 Character data: '
\012'
240 \section{\module{pyexpat.errors
} --- Error constants
}
242 \declaremodule{builtin
}{pyexpat.errors
}
243 \modulesynopsis{Error constants defined for the Expat parser
}
244 \moduleauthor{Paul Prescod
}{paul@prescod.net
}
245 \sectionauthor{A.M. Kuchling
}{amk1@bigfoot.com
}
247 The following table lists the error constants in the
248 \module{pyexpat.errors
} submodule, available once the
249 \refmodule{pyexpat
} module has been imported.
251 Note that this module cannot be imported directly until
252 \refmodule{pyexpat
} has been imported.
254 The following constants are defined:
256 \begin{datadesc
}{XML_ERROR_ASYNC_ENTITY
}
259 \begin{datadesc
}{XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
}
262 \begin{datadesc
}{XML_ERROR_BAD_CHAR_REF
}
265 \begin{datadesc
}{XML_ERROR_BINARY_ENTITY_REF
}
268 \begin{datadesc
}{XML_ERROR_DUPLICATE_ATTRIBUTE
}
269 An attribute was used more than once in a start tag.
272 \begin{datadesc
}{XML_ERROR_INCORRECT_ENCODING
}
275 \begin{datadesc
}{XML_ERROR_INVALID_TOKEN
}
278 \begin{datadesc
}{XML_ERROR_JUNK_AFTER_DOC_ELEMENT
}
279 Something other than whitespace occurred after the
document element.
282 \begin{datadesc
}{XML_ERROR_MISPLACED_XML_PI
}
285 \begin{datadesc
}{XML_ERROR_NO_ELEMENTS
}
288 \begin{datadesc
}{XML_ERROR_NO_MEMORY
}
289 Expat was not able to allocate memory internally.
292 \begin{datadesc
}{XML_ERROR_PARAM_ENTITY_REF
}
295 \begin{datadesc
}{XML_ERROR_PARTIAL_CHAR
}
298 \begin{datadesc
}{XML_ERROR_RECURSIVE_ENTITY_REF
}
301 \begin{datadesc
}{XML_ERROR_SYNTAX
}
302 Some unspecified syntax error was encountered.
305 \begin{datadesc
}{XML_ERROR_TAG_MISMATCH
}
306 An end tag did not match the innermost open start tag.
309 \begin{datadesc
}{XML_ERROR_UNCLOSED_TOKEN
}
312 \begin{datadesc
}{XML_ERROR_UNDEFINED_ENTITY
}
313 A reference was made to a entity which was not defined.
316 \begin{datadesc
}{XML_ERROR_UNKNOWN_ENCODING
}
317 The
document encoding is not supported by Expat.