1 <documentation title="CXML Klacks parser">
4 The Klacks parser provides an alternative parsing interface,
5 similar in concept to Java's <a
6 href="http://jcp.org/en/jsr/detail?id=173">Streaming API for
10 It implements a streaming, "pull-based" API. This is different
11 from SAX, which is a "push-based" model.
14 Klacks is implemented using the same code base as the SAX parser
15 and has the same parsing characteristics (validation, namespace
16 support, entity resolution) while offering a more flexible interface
20 See below for <a href="#examples">examples</a>.
24 <h3>Parsing incrementally using sources</h3>
26 To parse using Klacks, create an XML <tt>source</tt> first.
29 <div class="def">Function CXML:MAKE-SOURCE (input &key validate
30 dtd root entity-resolver disallow-external-subset pathname)</div>
31 Create and return a source for <tt>input</tt>.
34 Exact behaviour depends on <tt>input</tt>, which can
35 be one of the following types:
39 <tt>pathname</tt> -- a Common Lisp pathname.
40 Open the file specified by the pathname and create a source for
41 the resulting stream. See below for information on how to
44 <li><tt>stream</tt> -- a Common Lisp stream with element-type
45 <tt>(unsigned-byte 8)</tt>. See below for information on how to
49 <tt>octets</tt> -- an <tt>(unsigned-byte 8)</tt> array.
50 The array is parsed directly, and interpreted according to the
51 encoding it specifies.
54 <tt>string</tt>/<tt>rod</tt> -- a rod (or <tt>string</tt> on
55 unicode-capable implementations).
56 Parses an XML document from the input string that has already
57 undergone external-format decoding.
61 <b>Closing streams:</b> Sources can refer to Lisp streams that
62 need to be closed after parsing. This includes a stream passed
63 explicitly as <tt>input</tt>, a stream created implicitly for the
64 <tt>pathname</tt> case, as well as any streams created
65 automatically for external parsed entities referred to by the
69 All these stream get closed automatically if end of file is
70 reached normally. Use <tt>klacks:close-source</tt> or
71 <tt>klacks:with-open-source</tt> to ensure that the streams get
75 <b>Buffering:</b> By default, the Klacks parser performs buffering
76 of octets being read from the stream as an optimization. This can
77 result in unwanted blocking if the stream is a socket and the
78 parser tries to read more data than required to parse the current
79 event. Use <tt>:buffering nil</tt> to disable this optimization.
83 <tt>buffering</tt> -- Boolean, defaults to <tt>t</tt>. If
84 enabled, read data several kilobytes at time. If disabled,
85 read only single bytes at a time.
89 The following <b>keyword arguments</b> have the same meaning as
90 with the SAX parser, please refer to the documentation of <a
91 href="sax.html#parser">parse-file</a> for more information:
103 <tt>entity-resolver</tt>
106 <tt>disallow-internal-subset</tt>
110 In addition, the following argument is for types of <tt>input</tt>
111 other than <tt>pathname</tt>:
115 <tt>pathname</tt> -- If specified, defines the base URI of the
116 document based on this pathname instance.
121 Events are read from the stream using the following functions:
123 <div class="def">Function KLACKS:PEEK (source)</div>
124 <p> => :start-document<br/>
125 or => :start-document, version, encoding, standalonep<br/>
126 or => :dtd, name, public-id, system-id<br/>
127 or => :start-element, uri, lname, qname<br/>
128 or => :end-element, uri, lname, qname<br/>
129 or => :characters, data<br/>
130 or => :processing-instruction, target, data<br/>
131 or => :comment, data<br/>
132 or => :end-document, data<br/>
136 <tt>peek</tt> returns the current event's key and main values.
139 <div class="def">Function KLACKS:PEEK-NEXT (source) => key, value*</div>
142 Advance the source forward to the next event and returns it
143 like <tt>peek</tt> would.
146 <div class="def">Function KLACKS:PEEK-VALUE (source) => value*</div>
149 Like <tt>peek</tt>, but return only the values, not the key.
152 <div class="def">Function KLACKS:CONSUME (source) => key, value*</div>
155 Return the same values <tt>peek</tt> would, and in addition
156 advance the source forward to the next event.
159 <div class="def">Function KLACKS:CURRENT-URI (source) => uri</div>
160 <div class="def">Function KLACKS:CURRENT-LNAME (source) => string</div>
161 <div class="def">Function KLACKS:CURRENT-QNAME (source) => string</div>
164 If the current event is :start-element or :end-element, return the
165 corresponding value. Else, signal an error.
168 <div class="def">Function KLACKS:CURRENT-CHARACTERS (source) => string</div>
171 If the current event is :characters, return the character data
172 value. Else, signal an error.
175 <div class="def">Function KLACKS:CURRENT-CDATA-SECTION-P (source) => boolean</div>
178 If the current event is :characters, determine whether the data was
179 specified using a CDATA section in the source document. Else,
183 <div class="def">Function KLACKS:MAP-CURRENT-NAMESPACE-DECLARATIONS (fn source) => nil</div>
186 For use only on :start-element and :end-element events, this
187 function report every namespace declaration on the current element.
188 On :start-element, these correspond to the xmlns attributes of the
189 start tag. On :end-element, the declarations of the corresponding
190 start tag are reported. No inherited namespaces are
191 included. <tt>fn</tt> is called only for each declaration with two
192 arguments, the prefix and uri.
195 <div class="def">Function KLACKS:MAP-ATTRIBUTES (fn source)</div>
198 Call <tt>fn</tt> for each attribute of the current start tag in
199 turn, and pass the following values as arguments to the function:
201 <li>namespace uri</li>
203 <li>qualified name</li>
204 <li>attribute value</li>
205 <li>a boolean indicating whether the attribute was specified
206 explicitly in the source document, rather than defaulted from
209 Only valid for :start-element.
212 Return a list of SAX attribute structures for the current start tag.
213 Only valid for :start-element.
217 <div class="def">Function KLACKS:CLOSE-SOURCE (source)</div>
218 Close all streams referred to by <tt>source</tt>.
221 <div class="def">Macro KLACKS:WITH-OPEN-SOURCE ((var source) &body body)</div>
222 Evaluate <tt>source</tt> to create a source object, bind it to
223 symbol <tt>var</tt> and evaluate <tt>body</tt> as an implicit progn.
224 Call <tt>klacks:close-source</tt> to close the source after
225 exiting <tt>body</tt>, whether normally or abnormally.
228 <a name="convenience"/>
229 <h3>Convenience functions</h3>
231 <div class="def">Function KLACKS:FIND-EVENT (source key)</div>
232 Read events from <tt>source</tt> and discard them until an event
233 of type <i>key</i> is found. Return values like <tt>peek</tt>, or
234 NIL if no such event was found.
237 <div class="def">Function KLACKS:FIND-ELEMENT (source &optional
239 Read events from <tt>source</tt> and discard them until an event
240 of type :start-element is found with matching local name and
241 namespace uri is found. If <tt>lname</tt> is <tt>nil</tt>, any
242 tag name matches. If <tt>uri</tt> is <tt>nil</tt>, any
243 namespace matches. Return values like <tt>peek</tt> or NIL if no
244 such event was found.
247 <div class="def">Condition KLACKS:KLACKS-ERROR (xml-parse-error)</div>
248 The condition class signalled by <tt>expect</tt>.
251 <div class="def">Function KLACKS:EXPECT (source key &optional
252 value1 value2 value3)</div>
253 Assert that the current event is equal to (key value1 value2
254 value3). (Ignore <i>value</i> arguments that are NIL.) If so,
255 return it as multiple values. Otherwise signal a
256 <tt>klacks-error</tt>.
259 <div class="def">Function KLACKS:SKIP (source key &optional
260 value1 value2 value3)</div>
261 <tt>expect</tt> the specific event, then <tt>consume</tt> it.
264 <div class="def">Macro KLACKS:EXPECTING-ELEMENT ((fn source
265 &optional lname uri) &body body</div>
266 Assert that the current event matches (:start-element uri lname).
267 (Ignore <i>value</i> arguments that are NIL) Otherwise signal a
268 <tt>klacks-error</tt>.
269 Evaluate <tt>body</tt> as an implicit progn. Finally assert that
270 the remaining event matches (:end-element uri lname).
274 <h3>Bridging Klacks and SAX</h3>
276 <div class="def">Function KLACKS:SERIALIZE-EVENT (source handler)</div>
277 Send the current klacks event from <tt>source</tt> as a SAX
278 event to the SAX <tt>handler</tt> and consume it.
281 <div class="def">Function KLACKS:SERIALIZE-ELEMENT (source handler
282 &key document-events)</div>
283 Read all klacks events from the following <tt>:start-element</tt> to
284 its <tt>:end-element</tt> and send them as SAX events
285 to <tt>handler</tt>. When this function is called, the current
286 event must be <tt>:start-element</tt>, else an error is
287 signalled. With <tt>document-events</tt> (the default),
288 <tt>sax:start-document</tt> and <tt>sax:end-document</tt> events
289 are sent around the element.
292 <div class="def">Function KLACKS:SERIALIZE-SOURCE (source handler)</div>
293 Read all klacks events from <tt>source</tt> and send them as SAX
294 events to the SAX <tt>handler</tt>.
297 <div class="def">Class KLACKS:TAPPING-SOURCE (source)</div>
298 A klacks source that relays events from an upstream klacks source
299 unchanged, while also emitting them as SAX events to a
300 user-specified handler at the same time.
303 <div class="def">Functon KLACKS:MAKE-TAPPING-SOURCE
304 (upstream-source &optional sax-handler)</div>
305 Create a tapping source relaying events
306 for <tt>upstream-source</tt>, and sending SAX events
307 to <tt>sax-handler</tt>.
311 <h3>Location information</h3>
313 <div class="def">Function KLACKS:CURRENT-LINE-NUMBER (source)</div>
314 Return an approximation of the current line number, or NIL.
317 <div class="def">Function KLACKS:CURRENT-COLUMN-NUMBER (source)</div>
318 Return an approximation of the current column number, or NIL.
321 <div class="def">Function KLACKS:CURRENT-SYSTEM-ID (source)</div>
322 Return the URI of the document being parsed. This is either the
323 main document, or the entity's system ID while contents of a parsed
324 general external entity are being processed.
327 <div class="def">Function KLACKS:CURRENT-XML-BASE (source)</div>
328 Return the [Base URI] of the current element. This URI can differ from
329 the value returned by <tt>current-system-id</tt> if xml:base
330 attributes are present.
336 The following example illustrates creation of a klacks <tt>source</tt>,
337 use of the <tt>peek-next</tt> function to read individual events,
338 and shows some of the most common event types.
340 <pre>* <b>(defparameter *source* (cxml:make-source "<example>text</example>"))</b>
343 * <b>(klacks:peek-next *source*)</b>
346 * <b>(klacks:peek-next *source*)</b>
349 "example" ;local name
350 "example" ;qualified name
352 * <b>(klacks:peek-next *source*)</b>
356 * <b>(klacks:peek-next *source*)</b>
362 * <b>(klacks:peek-next *source*)</b>
365 * <b>(klacks:peek-next *source*)</b>
369 In this example, <tt>find-element</tt> is used to skip over the
370 uninteresting events until the opening <tt>child1</tt> tag is
371 found. Then <tt>serialize-element</tt> is used to generate SAX
372 events for the following element, including its children, and an
373 xmls-compatible list structure is built from those
374 events. <tt>find-element</tt> skips over whitespace,
375 and <tt>find-event</tt> is used to parse up
376 to <tt>:end-document</tt>, ensuring that the source has been
379 <pre>* <b>(defparameter *source*
380 (cxml:make-source "<example>
381 <child1><p>foo</p></child1>
382 <child2 bar='baz'/>
386 * <b>(klacks:find-element *source* "child1")</b>
392 * <b>(klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))</b>
393 ("child1" NIL ("p" NIL "foo"))
395 * <b>(klacks:find-element *source*)</b>
401 * <b>(klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))</b>
402 ("child2" (("bar" "baz")))
404 * <b>(klacks:find-event *source* :end-document)</b>