py-cvs-rel2_1 (Rev 1.2) merge
[python/dscho.git] / Doc / lib / xmldom.tex
blob17e21c0ccf37714c830edca352d4f4e9189f2e08
1 \section{\module{xml.dom} ---
2 The Document Object Model API}
4 \declaremodule{standard}{xml.dom}
5 \modulesynopsis{Document Object Model API for Python.}
6 \sectionauthor{Paul Prescod}{paul@prescod.net}
7 \sectionauthor{Martin v. L\"owis}{loewis@informatik.hu-berlin.de}
9 \versionadded{2.0}
11 The Document Object Model, or ``DOM,'' is a cross-language API from
12 the World Wide Web Consortium (W3C) for accessing and modifying XML
13 documents. A DOM implementation presents an XML document as a tree
14 structure, or allows client code to build such a structure from
15 scratch. It then gives access to the structure through a set of
16 objects which provided well-known interfaces.
18 The DOM is extremely useful for random-access applications. SAX only
19 allows you a view of one bit of the document at a time. If you are
20 looking at one SAX element, you have no access to another. If you are
21 looking at a text node, you have no access to a containing element.
22 When you write a SAX application, you need to keep track of your
23 program's position in the document somewhere in your own code. SAX
24 does not do it for you. Also, if you need to look ahead in the XML
25 document, you are just out of luck.
27 Some applications are simply impossible in an event driven model with
28 no access to a tree. Of course you could build some sort of tree
29 yourself in SAX events, but the DOM allows you to avoid writing that
30 code. The DOM is a standard tree representation for XML data.
32 %What if your needs are somewhere between SAX and the DOM? Perhaps
33 %you cannot afford to load the entire tree in memory but you find the
34 %SAX model somewhat cumbersome and low-level. There is also a module
35 %called xml.dom.pulldom that allows you to build trees of only the
36 %parts of a document that you need structured access to. It also has
37 %features that allow you to find your way around the DOM.
38 % See http://www.prescod.net/python/pulldom
40 The Document Object Model is being defined by the W3C in stages, or
41 ``levels'' in their terminology. The Python mapping of the API is
42 substantially based on the DOM Level 2 recommendation. Some aspects
43 of the API will only become available in Python 2.1, or may only be
44 available in particular DOM implementations.
46 DOM applications typically start by parsing some XML into a DOM. How
47 this is accomplished is not covered at all by DOM Level 1, and Level 2
48 provides only limited improvements. There is a
49 \class{DOMImplementation} object class which provides access to
50 \class{Document} creation methods, but these methods were only added
51 in DOM Level 2 and were not implemented in time for Python 2.0. There
52 is also no well-defined way to access these methods without an
53 existing \class{Document} object. For Python 2.0, consult the
54 documentation for each particular DOM implementation to determine the
55 bootstrap procedure needed to create and initialize \class{Document}
56 and \class{DocumentType} instances.
58 Once you have a DOM document object, you can access the parts of your
59 XML document through its properties and methods. These properties are
60 defined in the DOM specification; this portion of the reference manual
61 describes the interpretation of the specification in Python.
63 The specification provided by the W3C defines the DOM API for Java,
64 ECMAScript, and OMG IDL. The Python mapping defined here is based in
65 large part on the IDL version of the specification, but strict
66 compliance is not required (though implementations are free to support
67 the strict mapping from IDL). See section \ref{dom-conformance},
68 ``Conformance,'' for a detailed discussion of mapping requirements.
71 \begin{seealso}
72 \seetitle[http://www.w3.org/TR/DOM-Level-2-Core/]{Document Object
73 Model (DOM) Level 2 Specification}
74 {The W3C recommendation upon which the Python DOM API is
75 based.}
76 \seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{Document Object
77 Model (DOM) Level 1 Specification}
78 {The W3C recommendation for the
79 DOM supported by \module{xml.dom.minidom}.}
80 \seetitle[http://pyxml.sourceforge.net]{PyXML}{Users that require a
81 full-featured implementation of DOM should use the PyXML
82 package.}
83 \seetitle[http://cgi.omg.org/cgi-bin/doc?orbos/99-08-02.pdf]{CORBA
84 Scripting with Python}
85 {This specifies the mapping from OMG IDL to Python.}
86 \end{seealso}
88 \subsection{Module Contents}
90 The \module{xml.dom} contains the following functions:
92 \begin{funcdesc}{registerDOMImplementation}{name, factory}
93 Register the \var{factory} function with the name \var{name}. The
94 factory function should return an object which implements the
95 \class{DOMImplementation} interface. The factory function can return
96 the same object every time, or a new one for each call, as appropriate
97 for the specific implementation (e.g. if that implementation supports
98 some customization).
99 \end{funcdesc}
101 \begin{funcdesc}{getDOMImplementation}{name = None, features = ()}
102 Return a suitable DOM implementation. The \var{name} is either
103 well-known, the module name of a DOM implementation, or
104 \code{None}. If it is not \code{None}, imports the corresponding module and
105 returns a \class{DOMImplementation} object if the import succeeds. If
106 no name is given, and if the environment variable \envvar{PYTHON_DOM} is
107 set, this variable is used to find the implementation.
109 If name is not given, consider the available implementations to find
110 one with the required feature set. If no implementation can be found,
111 raise an \exception{ImportError}. The features list must be a sequence of
112 (feature, version) pairs which are passed to hasFeature.
113 \end{funcdesc}
115 % Should the Node documentation go here?
117 In addition, \module{xml.dom} contains the \class{Node}, and the DOM
118 exceptions.
120 \subsection{Objects in the DOM \label{dom-objects}}
122 The definitive documentation for the DOM is the DOM specification from
123 the W3C.
125 Note that DOM attributes may also be manipulated as nodes instead of
126 as simple strings. It is fairly rare that you must do this, however,
127 so this usage is not yet documented.
130 \begin{tableiii}{l|l|l}{class}{Interface}{Section}{Purpose}
131 \lineiii{DOMImplementation}{\ref{dom-implementation-objects}}
132 {Interface to the underlying implementation.}
133 \lineiii{Node}{\ref{dom-node-objects}}
134 {Base interface for most objects in a document.}
135 \lineiii{NodeList}{\ref{dom-nodelist-objects}}
136 {Interface for a sequence of nodes.}
137 \lineiii{DocumentType}{\ref{dom-documenttype-objects}}
138 {Information about the declarations needed to process a document.}
139 \lineiii{Document}{\ref{dom-document-objects}}
140 {Object which represents an entire document.}
141 \lineiii{Element}{\ref{dom-element-objects}}
142 {Element nodes in the document hierarchy.}
143 \lineiii{Attr}{\ref{dom-attr-objects}}
144 {Attribute value nodes on element nodes.}
145 \lineiii{Comment}{\ref{dom-comment-objects}}
146 {Representation of comments in the source document.}
147 \lineiii{Text}{\ref{dom-text-objects}}
148 {Nodes containing textual content from the document.}
149 \lineiii{ProcessingInstruction}{\ref{dom-pi-objects}}
150 {Processing instruction representation.}
151 \end{tableiii}
153 An additional section describes the exceptions defined for working
154 with the DOM in Python.
157 \subsubsection{DOMImplementation Objects
158 \label{dom-implementation-objects}}
160 The \class{DOMImplementation} interface provides a way for
161 applications to determine the availability of particular features in
162 the DOM they are using. DOM Level 2 added the ability to create new
163 \class{Document} and \class{DocumentType} objects using the
164 \class{DOMImplementation} as well.
166 \begin{methoddesc}[DOMImplementation]{hasFeature}{feature, version}
167 \end{methoddesc}
170 \subsubsection{Node Objects \label{dom-node-objects}}
172 All of the components of an XML document are subclasses of
173 \class{Node}.
175 \begin{memberdesc}[Node]{nodeType}
176 An integer representing the node type. Symbolic constants for the
177 types are on the \class{Node} object:
178 \constant{ELEMENT_NODE}, \constant{ATTRIBUTE_NODE},
179 \constant{TEXT_NODE}, \constant{CDATA_SECTION_NODE},
180 \constant{ENTITY_NODE}, \constant{PROCESSING_INSTRUCTION_NODE},
181 \constant{COMMENT_NODE}, \constant{DOCUMENT_NODE},
182 \constant{DOCUMENT_TYPE_NODE}, \constant{NOTATION_NODE}.
183 This is a read-only attribute.
184 \end{memberdesc}
186 \begin{memberdesc}[Node]{parentNode}
187 The parent of the current node, or \code{None} for the document node.
188 The value is always a \class{Node} object or \code{None}. For
189 \class{Element} nodes, this will be the parent element, except for the
190 root element, in which case it will be the \class{Document} object.
191 For \class{Attr} nodes, this is always \code{None}.
192 This is a read-only attribute.
193 \end{memberdesc}
195 \begin{memberdesc}[Node]{attributes}
196 A \class{NamedNodeList} of attribute objects. Only elements have
197 actual values for this; others provide \code{None} for this attribute.
198 This is a read-only attribute.
199 \end{memberdesc}
201 \begin{memberdesc}[Node]{previousSibling}
202 The node that immediately precedes this one with the same parent. For
203 instance the element with an end-tag that comes just before the
204 \var{self} element's start-tag. Of course, XML documents are made
205 up of more than just elements so the previous sibling could be text, a
206 comment, or something else. If this node is the first child of the
207 parent, this attribute will be \code{None}.
208 This is a read-only attribute.
209 \end{memberdesc}
211 \begin{memberdesc}[Node]{nextSibling}
212 The node that immediately follows this one with the same parent. See
213 also \member{previousSibling}. If this is the last child of the
214 parent, this attribute will be \code{None}.
215 This is a read-only attribute.
216 \end{memberdesc}
218 \begin{memberdesc}[Node]{childNodes}
219 A list of nodes contained within this node.
220 This is a read-only attribute.
221 \end{memberdesc}
223 \begin{memberdesc}[Node]{firstChild}
224 The first child of the node, if there are any, or \code{None}.
225 This is a read-only attribute.
226 \end{memberdesc}
228 \begin{memberdesc}[Node]{lastChild}
229 The last child of the node, if there are any, or \code{None}.
230 This is a read-only attribute.
231 \end{memberdesc}
233 \begin{memberdesc}[Node]{localName}
234 The part of the \member{tagName} following the colon if there is one,
235 else the entire \member{tagName}. The value is a string.
236 \end{memberdesc}
238 \begin{memberdesc}[Node]{prefix}
239 The part of the \member{tagName} preceding the colon if there is one,
240 else the empty string. The value is a string, or \code{None}
241 \end{memberdesc}
243 \begin{memberdesc}[Node]{namespaceURI}
244 The namespace associated with the element name. This will be a
245 string or \code{None}. This is a read-only attribute.
246 \end{memberdesc}
248 \begin{memberdesc}[Node]{nodeName}
249 This has a different meaning for each node type; see the DOM
250 specification for details. You can always get the information you
251 would get here from another property such as the \member{tagName}
252 property for elements or the \member{name} property for attributes.
253 For all node types, the value of this attribute will be either a
254 string or \code{None}. This is a read-only attribute.
255 \end{memberdesc}
257 \begin{memberdesc}[Node]{nodeValue}
258 This has a different meaning for each node type; see the DOM
259 specification for details. The situation is similar to that with
260 \member{nodeName}. The value is a string or \code{None}.
261 \end{memberdesc}
263 \begin{methoddesc}[Node]{hasAttributes}{}
264 Returns true if the node has any attributes.
265 \end{methoddesc}
267 \begin{methoddesc}[Node]{hasChildNodes}{}
268 Returns true if the node has any child nodes.
269 \end{methoddesc}
271 \begin{methoddesc}[Node]{isSameNode}{other}
272 Returns true if \var{other} refers to the same node as this node.
273 This is especially useful for DOM implementations which use any sort
274 of proxy architecture (because more than one object can refer to the
275 same node).
277 \strong{Note:} This is based on a proposed DOM Level 3 API which is
278 still in the ``working draft'' stage, but this particular interface
279 appears uncontroversial. Changes from the W3C will not necessarily
280 affect this method in the Python DOM interface (though any new W3C
281 API for this would also be supported).
282 \end{methoddesc}
284 \begin{methoddesc}[Node]{appendChild}{newChild}
285 Add a new child node to this node at the end of the list of children,
286 returning \var{newChild}.
287 \end{methoddesc}
289 \begin{methoddesc}[Node]{insertBefore}{newChild, refChild}
290 Insert a new child node before an existing child. It must be the case
291 that \var{refChild} is a child of this node; if not,
292 \exception{ValueError} is raised. \var{newChild} is returned.
293 \end{methoddesc}
295 \begin{methoddesc}[Node]{removeChild}{oldChild}
296 Remove a child node. \var{oldChild} must be a child of this node; if
297 not, \exception{ValueError} is raised. \var{oldChild} is returned on
298 success. If \var{oldChild} will not be used further, its
299 \method{unlink()} method should be called.
300 \end{methoddesc}
302 \begin{methoddesc}[Node]{replaceChild}{newChild, oldChild}
303 Replace an existing node with a new node. It must be the case that
304 \var{oldChild} is a child of this node; if not,
305 \exception{ValueError} is raised.
306 \end{methoddesc}
308 \begin{methoddesc}[Node]{normalize}{}
309 Join adjacent text nodes so that all stretches of text are stored as
310 single \class{Text} instances. This simplifies processing text from a
311 DOM tree for many applications.
312 \versionadded{2.1}
313 \end{methoddesc}
315 \begin{methoddesc}[Node]{cloneNode}{deep}
316 Clone this node. Setting \var{deep} means to clone all child nodes as
317 well. This returns the clone.
318 \end{methoddesc}
321 \subsubsection{NodeList Objects \label{dom-nodelist-objects}}
323 A \class{NodeList} represents a sequence of nodes. These objects are
324 used in two ways in the DOM Core recommendation: the
325 \class{Element} objects provides one as it's list of child nodes, and
326 the \method{getElementsByTagName()} and
327 \method{getElementsByTagNameNS()} methods of \class{Node} return
328 objects with this interface to represent query results.
330 The DOM Level 2 recommendation defines one method and one attribute
331 for these objects:
333 \begin{methoddesc}[NodeList]{item}{i}
334 Return the \var{i}'th item from the sequence, if there is one, or
335 \code{None}. The index \var{i} is not allowed to be less then zero
336 or greater than or equal to the length of the sequence.
337 \end{methoddesc}
339 \begin{memberdesc}[NodeList]{length}
340 The number of nodes in the sequence.
341 \end{memberdesc}
343 In addition, the Python DOM interface requires that some additional
344 support is provided to allow \class{NodeList} objects to be used as
345 Python sequences. All \class{NodeList} implementations must include
346 support for \method{__len__()} and \method{__getitem__()}; this allows
347 iteration over the \class{NodeList} in \keyword{for} statements and
348 proper support for the \function{len()} built-in function.
350 If a DOM implementation supports modification of the document, the
351 \class{NodeList} implementation must also support the
352 \method{__setitem__()} and \method{__delitem__()} methods.
355 \subsubsection{DocumentType Objects \label{dom-documenttype-objects}}
357 Information about the notations and entities declared by a document
358 (including the external subset if the parser uses it and can provide
359 the information) is available from a \class{DocumentType} object. The
360 \class{DocumentType} for a document is available from the
361 \class{Document} object's \member{doctype} attribute.
363 \class{DocumentType} is a specialization of \class{Node}, and adds the
364 following attributes:
366 \begin{memberdesc}[DocumentType]{publicId}
367 The public identifier for the external subset of the document type
368 definition. This will be a string or \code{None}.
369 \end{memberdesc}
371 \begin{memberdesc}[DocumentType]{systemId}
372 The system identifier for the external subset of the document type
373 definition. This will be a URI as a string, or \code{None}.
374 \end{memberdesc}
376 \begin{memberdesc}[DocumentType]{internalSubset}
377 A string giving the complete internal subset from the document.
378 This does not include the brackets which enclose the subset. If the
379 document has no internal subset, this should be \code{None}.
380 \end{memberdesc}
382 \begin{memberdesc}[DocumentType]{name}
383 The name of the root element as given in the \code{DOCTYPE}
384 declaration, if present. If the was no \code{DOCTYPE} declaration,
385 this will be \code{None}.
386 \end{memberdesc}
388 \begin{memberdesc}[DocumentType]{entities}
389 This is a \class{NamedNodeMap} giving the definitions of external
390 entities. For entity names defined more than once, only the first
391 definition is provided (others are ignored as required by the XML
392 recommendation). This may be \code{None} if the information is not
393 provided by the parser, or if no entities are defined.
394 \end{memberdesc}
396 \begin{memberdesc}[DocumentType]{notations}
397 This is a \class{NamedNodeMap} giving the definitions of notations.
398 For notation names defined more than once, only the first definition
399 is provided (others are ignored as required by the XML
400 recommendation). This may be \code{None} if the information is not
401 provided by the parser, or if no notations are defined.
402 \end{memberdesc}
405 \subsubsection{Document Objects \label{dom-document-objects}}
407 A \class{Document} represents an entire XML document, including its
408 constituent elements, attributes, processing instructions, comments
409 etc. Remeber that it inherits properties from \class{Node}.
411 \begin{memberdesc}[Document]{documentElement}
412 The one and only root element of the document.
413 \end{memberdesc}
415 \begin{methoddesc}[Document]{createElement}{tagName}
416 Create and return a new element node. The element is not inserted
417 into the document when it is created. You need to explicitly insert
418 it with one of the other methods such as \method{insertBefore()} or
419 \method{appendChild()}.
420 \end{methoddesc}
422 \begin{methoddesc}[Document]{createElementNS}{namespaceURI, tagName}
423 Create and return a new element with a namespace. The
424 \var{tagName} may have a prefix. The element is not inserted into the
425 document when it is created. You need to explicitly insert it with
426 one of the other methods such as \method{insertBefore()} or
427 \method{appendChild()}.
428 \end{methoddesc}
430 \begin{methoddesc}[Document]{createTextNode}{data}
431 Create and return a text node containing the data passed as a
432 parameter. As with the other creation methods, this one does not
433 insert the node into the tree.
434 \end{methoddesc}
436 \begin{methoddesc}[Document]{createComment}{data}
437 Create and return a comment node containing the data passed as a
438 parameter. As with the other creation methods, this one does not
439 insert the node into the tree.
440 \end{methoddesc}
442 \begin{methoddesc}[Document]{createProcessingInstruction}{target, data}
443 Create and return a processing instruction node containing the
444 \var{target} and \var{data} passed as parameters. As with the other
445 creation methods, this one does not insert the node into the tree.
446 \end{methoddesc}
448 \begin{methoddesc}[Document]{createAttribute}{name}
449 Create and return an attribute node. This method does not associate
450 the attribute node with any particular element. You must use
451 \method{setAttributeNode()} on the appropriate \class{Element} object
452 to use the newly created attribute instance.
453 \end{methoddesc}
455 \begin{methoddesc}[Document]{createAttributeNS}{namespaceURI, qualifiedName}
456 Create and return an attribute node with a namespace. The
457 \var{tagName} may have a prefix. This method does not associate the
458 attribute node with any particular element. You must use
459 \method{setAttributeNode()} on the appropriate \class{Element} object
460 to use the newly created attribute instance.
461 \end{methoddesc}
463 \begin{methoddesc}[Document]{getElementsByTagName}{tagName}
464 Search for all descendants (direct children, children's children,
465 etc.) with a particular element type name.
466 \end{methoddesc}
468 \begin{methoddesc}[Document]{getElementsByTagNameNS}{namespaceURI, localName}
469 Search for all descendants (direct children, children's children,
470 etc.) with a particular namespace URI and localname. The localname is
471 the part of the namespace after the prefix.
472 \end{methoddesc}
475 \subsubsection{Element Objects \label{dom-element-objects}}
477 \class{Element} is a subclass of \class{Node}, so inherits all the
478 attributes of that class.
480 \begin{memberdesc}[Element]{tagName}
481 The element type name. In a namespace-using document it may have
482 colons in it. The value is a string.
483 \end{memberdesc}
485 \begin{methoddesc}[Element]{getElementsByTagName}{tagName}
486 Same as equivalent method in the \class{Document} class.
487 \end{methoddesc}
489 \begin{methoddesc}[Element]{getElementsByTagNameNS}{tagName}
490 Same as equivalent method in the \class{Document} class.
491 \end{methoddesc}
493 \begin{methoddesc}[Element]{getAttribute}{attname}
494 Return an attribute value as a string.
495 \end{methoddesc}
497 \begin{methoddesc}[Element]{getAttributeNode}{attrname}
498 Return the \class{Attr} node for the attribute named by
499 \var{attrname}.
500 \end{methoddesc}
502 \begin{methoddesc}[Element]{getAttributeNS}{namespaceURI, localName}
503 Return an attribute value as a string, given a \var{namespaceURI} and
504 \var{localName}.
505 \end{methoddesc}
507 \begin{methoddesc}[Element]{getAttributeNodeNS}{namespaceURI, localName}
508 Return an attribute value as a node, given a \var{namespaceURI} and
509 \var{localName}.
510 \end{methoddesc}
512 \begin{methoddesc}[Element]{removeAttribute}{attname}
513 Remove an attribute by name. No exception is raised if there is no
514 matching attribute.
515 \end{methoddesc}
517 \begin{methoddesc}[Element]{removeAttributeNode}{oldAttr}
518 Remove and return \var{oldAttr} from the attribute list, if present.
519 If \var{oldAttr} is not present, \exception{NotFoundErr} is raised.
520 \end{methoddesc}
522 \begin{methoddesc}[Element]{removeAttributeNS}{namespaceURI, localName}
523 Remove an attribute by name. Note that it uses a localName, not a
524 qname. No exception is raised if there is no matching attribute.
525 \end{methoddesc}
527 \begin{methoddesc}[Element]{setAttribute}{attname, value}
528 Set an attribute value from a string.
529 \end{methoddesc}
531 \begin{methoddesc}[Element]{setAttributeNode}{newAttr}
532 Add a new attibute node to the element, replacing an existing
533 attribute if necessary if the \member{name} attribute matches. If a
534 replacement occurs, the old attribute node will be returned. If
535 \var{newAttr} is already in use, \exception{InuseAttributeErr} will be
536 raised.
537 \end{methoddesc}
539 \begin{methoddesc}[Element]{setAttributeNodeNS}{newAttr}
540 Add a new attibute node to the element, replacing an existing
541 attribute if necessary if the \member{namespaceURI} and
542 \member{localName} attributes match. If a replacement occurs, the old
543 attribute node will be returned. If \var{newAttr} is already in use,
544 \exception{InuseAttributeErr} will be raised.
545 \end{methoddesc}
547 \begin{methoddesc}[Element]{setAttributeNS}{namespaceURI, qname, value}
548 Set an attribute value from a string, given a \var{namespaceURI} and a
549 \var{qname}. Note that a qname is the whole attribute name. This is
550 different than above.
551 \end{methoddesc}
554 \subsubsection{Attr Objects \label{dom-attr-objects}}
556 \class{Attr} inherits from \class{Node}, so inherits all its
557 attributes.
559 \begin{memberdesc}[Attr]{name}
560 The attribute name. In a namespace-using document it may have colons
561 in it.
562 \end{memberdesc}
564 \begin{memberdesc}[Attr]{localName}
565 The part of the name following the colon if there is one, else the
566 entire name. This is a read-only attribute.
567 \end{memberdesc}
569 \begin{memberdesc}[Attr]{prefix}
570 The part of the name preceding the colon if there is one, else the
571 empty string.
572 \end{memberdesc}
575 \subsubsection{NamedNodeMap Objects \label{dom-attributelist-objects}}
577 \class{NamedNodeMap} does \emph{not} inherit from \class{Node}.
579 \begin{memberdesc}[NamedNodeMap]{length}
580 The length of the attribute list.
581 \end{memberdesc}
583 \begin{methoddesc}[NamedNodeMap]{item}{index}
584 Return an attribute with a particular index. The order you get the
585 attributes in is arbitrary but will be consistent for the life of a
586 DOM. Each item is an attribute node. Get its value with the
587 \member{value} attribbute.
588 \end{methoddesc}
590 There are also experimental methods that give this class more mapping
591 behavior. You can use them or you can use the standardized
592 \method{getAttribute*()}-family methods on the \class{Element} objects.
595 \subsubsection{Comment Objects \label{dom-comment-objects}}
597 \class{Comment} represents a comment in the XML document. It is a
598 subclass of \class{Node}, but cannot have child nodes.
600 \begin{memberdesc}[Comment]{data}
601 The content of the comment as a string. The attribute contains all
602 characters between the leading \code{<!-}\code{-} and trailing
603 \code{-}\code{->}, but does not include them.
604 \end{memberdesc}
607 \subsubsection{Text and CDATASection Objects \label{dom-text-objects}}
609 The \class{Text} interface represents text in the XML document. If
610 the parser and DOM implementation support the DOM's XML extension,
611 portions of the text enclosed in CDATA marked sections are stored in
612 \class{CDATASection} objects. These two interfaces are identical, but
613 provide different values for the \member{nodeType} attribute.
615 These interfaces extend the \class{Node} interface. They cannot have
616 child nodes.
618 \begin{memberdesc}[Text]{data}
619 The content of the text node as a string.
620 \end{memberdesc}
622 \strong{Note:} The use of a \class{CDATASection} node does not
623 indicate that the node represents a complete CDATA marked section,
624 only that the content of the node was part of a CDATA section. A
625 single CDATA section may be represented by more than one node in the
626 document tree. There is no way to determine whether two adjacent
627 \class{CDATASection} nodes represent different CDATA marked sections.
630 \subsubsection{ProcessingInstruction Objects \label{dom-pi-objects}}
632 Represents a processing instruction in the XML document; this inherits
633 from the \class{Node} interface and cannot have child nodes.
635 \begin{memberdesc}[ProcessingInstruction]{target}
636 The content of the processing instruction up to the first whitespace
637 character. This is a read-only attribute.
638 \end{memberdesc}
640 \begin{memberdesc}[ProcessingInstruction]{data}
641 The content of the processing instruction following the first
642 whitespace character.
643 \end{memberdesc}
646 \subsubsection{Exceptions \label{dom-exceptions}}
648 \versionadded{2.1}
650 The DOM Level 2 recommendation defines a single exception,
651 \exception{DOMException}, and a number of constants that allow
652 applications to determine what sort of error occurred.
653 \exception{DOMException} instances carry a \member{code} attribute
654 that provides the appropriate value for the specific exception.
656 The Python DOM interface provides the constants, but also expands the
657 set of exceptions so that a specific exception exists for each of the
658 exception codes defined by the DOM. The implementations must raise
659 the appropriate specific exception, each of which carries the
660 appropriate value for the \member{code} attribute.
662 \begin{excdesc}{DOMException}
663 Base exception class used for all specific DOM exceptions. This
664 exception class cannot be directly instantiated.
665 \end{excdesc}
667 \begin{excdesc}{DomstringSizeErr}
668 Raised when a specified range of text does not fit into a string.
669 This is not known to be used in the Python DOM implementations, but
670 may be received from DOM implementations not written in Python.
671 \end{excdesc}
673 \begin{excdesc}{HierarchyRequestErr}
674 Raised when an attempt is made to insert a node where the node type
675 is not allowed.
676 \end{excdesc}
678 \begin{excdesc}{IndexSizeErr}
679 Raised when an index or size parameter to a method is negative or
680 exceeds the allowed values.
681 \end{excdesc}
683 \begin{excdesc}{InuseAttributeErr}
684 Raised when an attempt is made to insert an \class{Attr} node that
685 is already present elsewhere in the document.
686 \end{excdesc}
688 \begin{excdesc}{InvalidAccessErr}
689 Raised if a parameter or an operation is not supported on the
690 underlying object.
691 \end{excdesc}
693 \begin{excdesc}{InvalidCharacterErr}
694 This exception is raised when a string parameter contains a
695 character that is not permitted in the context it's being used in by
696 the XML 1.0 recommendation. For example, attempting to create an
697 \class{Element} node with a space in the element type name will
698 cause this error to be raised.
699 \end{excdesc}
701 \begin{excdesc}{InvalidModificationErr}
702 Raised when an attempt is made to modify the type of a node.
703 \end{excdesc}
705 \begin{excdesc}{InvalidStateErr}
706 Raised when an attempt is made to use an object that is not or is no
707 longer usable.
708 \end{excdesc}
710 \begin{excdesc}{NamespaceErr}
711 If an attempt is made to change any object in a way that is not
712 permitted with regard to the
713 \citetitle[http://www.w3.org/TR/REC-xml-names/]{Namespaces in XML}
714 recommendation, this exception is raised.
715 \end{excdesc}
717 \begin{excdesc}{NotFoundErr}
718 Exception when a node does not exist in the referenced context. For
719 example, \method{NamedNodeMap.removeNamedItem()} will raise this if
720 the node passed in does not exist in the map.
721 \end{excdesc}
723 \begin{excdesc}{NotSupportedErr}
724 Raised when the implementation does not support the requested type
725 of object or operation.
726 \end{excdesc}
728 \begin{excdesc}{NoDataAllowedErr}
729 This is raised if data is specified for a node which does not
730 support data.
731 % XXX a better explanation is needed!
732 \end{excdesc}
734 \begin{excdesc}{NoModificationAllowedErr}
735 Raised on attempts to modify an object where modifications are not
736 allowed (such as for read-only nodes).
737 \end{excdesc}
739 \begin{excdesc}{SyntaxErr}
740 Raised when an invalid or illegal string is specified.
741 % XXX how is this different from InvalidCharacterErr ???
742 \end{excdesc}
744 \begin{excdesc}{WrongDocumentErr}
745 Raised when a node is inserted in a different document than it
746 currently belongs to, and the implementation does not support
747 migrating the node from one document to the other.
748 \end{excdesc}
750 The exception codes defined in the DOM recommendation map to the
751 exceptions described above according to this table:
753 \begin{tableii}{l|l}{constant}{Constant}{Exception}
754 \lineii{DOMSTRING_SIZE_ERR}{\exception{DomstringSizeErr}}
755 \lineii{HIERARCHY_REQUEST_ERR}{\exception{HierarchyRequestErr}}
756 \lineii{INDEX_SIZE_ERR}{\exception{IndexSizeErr}}
757 \lineii{INUSE_ATTRIBUTE_ERR}{\exception{InuseAttributeErr}}
758 \lineii{INVALID_ACCESS_ERR}{\exception{InvalidAccessErr}}
759 \lineii{INVALID_CHARACTER_ERR}{\exception{InvalidCharacterErr}}
760 \lineii{INVALID_MODIFICATION_ERR}{\exception{InvalidModificationErr}}
761 \lineii{INVALID_STATE_ERR}{\exception{InvalidStateErr}}
762 \lineii{NAMESPACE_ERR}{\exception{NamespaceErr}}
763 \lineii{NOT_FOUND_ERR}{\exception{NotFoundErr}}
764 \lineii{NOT_SUPPORTED_ERR}{\exception{NotSupportedErr}}
765 \lineii{NO_DATA_ALLOWED_ERR}{\exception{NoDataAllowedErr}}
766 \lineii{NO_MODIFICATION_ALLOWED_ERR}{\exception{NoModificationAllowedErr}}
767 \lineii{SYNTAX_ERR}{\exception{SyntaxErr}}
768 \lineii{WRONG_DOCUMENT_ERR}{\exception{WrongDocumentErr}}
769 \end{tableii}
772 \subsection{Conformance \label{dom-conformance}}
774 This section describes the conformance requirements and relationships
775 between the Python DOM API, the W3C DOM recommendations, and the OMG
776 IDL mapping for Python.
779 \subsubsection{Type Mapping \label{dom-type-mapping}}
781 The primitive IDL types used in the DOM specification are mapped to
782 Python types according to the following table.
784 \begin{tableii}{l|l}{code}{IDL Type}{Python Type}
785 \lineii{boolean}{\code{IntegerType} (with a value of \code{0} or \code{1})}
786 \lineii{int}{\code{IntegerType}}
787 \lineii{long int}{\code{IntegerType}}
788 \lineii{unsigned int}{\code{IntegerType}}
789 \end{tableii}
791 Additionally, the \class{DOMString} defined in the recommendation is
792 mapped to a Python string or Unicode string. Applications should
793 be able to handle Unicode whenever a string is returned from the DOM.
795 The IDL \keyword{null} value is mapped to \code{None}, which may be
796 accepted or provided by the implementation whenever \keyword{null} is
797 allowed by the API.
800 \subsubsection{Accessor Methods \label{dom-accessor-methods}}
802 The mapping from OMG IDL to Python defines accessor functions for IDL
803 \keyword{attribute} declarations in much the way the Java mapping
804 does. Mapping the IDL declarations
806 \begin{verbatim}
807 readonly attribute string someValue;
808 attribute string anotherValue;
809 \end{verbatim}
811 yields three accessor functions: a ``get'' method for
812 \member{someValue} (\method{_get_someValue()}), and ``get'' and
813 ``set'' methods for
814 \member{anotherValue} (\method{_get_anotherValue()} and
815 \method{_set_anotherValue()}). The mapping, in particular, does not
816 require that the IDL attributes are accessible as normal Python
817 attributes: \code{\var{object}.someValue} is \emph{not} required to
818 work, and may raise an \exception{AttributeError}.
820 The Python DOM API, however, \emph{does} require that normal attribute
821 access work. This means that the typical surrogates generated by
822 Python IDL compilers are not likely to work, and wrapper objects may
823 be needed on the client if the DOM objects are accessed via CORBA.
824 While this does require some additional consideration for CORBA DOM
825 clients, the implementers with experience using DOM over CORBA from
826 Python do not consider this a problem. Attributes that are declared
827 \keyword{readonly} may not restrict write access in all DOM
828 implementations.
830 Additionally, the accessor functions are not required. If provided,
831 they should take the form defined by the Python IDL mapping, but
832 these methods are considered unnecessary since the attributes are
833 accessible directly from Python. ``Set'' accessors should never be
834 provided for \keyword{readonly} attributes.