1 <documentation title="CXML Klacks parser">
4 The Klacks parser provides an alternative parsing interface,
5 similar in concept to Java's <a
6 href="http://jcp.org/en/jsr/detail?id=173">Streaming API for
10 It implements a streaming, "pull-based" API. This is different
11 from SAX, which is a "push-based" model.
14 Klacks is implemented using the same code base as the SAX parser
15 and has the same parsing characteristics (validation, namespace
16 support, entity resolution) while offering a more flexible interface
20 See below for <a href="#examples">examples</a>.
24 <h3>Parsing incrementally using sources</h3>
26 To parse using Klacks, create an XML <tt>source</tt> first.
29 <div class="def">Function CXML:MAKE-SOURCE (input &key validate
30 dtd root entity-resolver disallow-external-subset pathname)</div>
31 Create and return a source for <tt>input</tt>.
34 Exact behaviour depends on <tt>input</tt>, which can
35 be one of the following types:
39 <tt>pathname</tt> -- a Common Lisp pathname.
40 Open the file specified by the pathname and create a source for
41 the resulting stream. See below for information on how to
44 <li><tt>stream</tt> -- a Common Lisp stream with element-type
45 <tt>(unsigned-byte 8)</tt>. See below for information on how to
49 <tt>octets</tt> -- an <tt>(unsigned-byte 8)</tt> array.
50 The array is parsed directly, and interpreted according to the
51 encoding it specifies.
54 <tt>string</tt>/<tt>rod</tt> -- a rod (or <tt>string</tt> on
55 unicode-capable implementations).
56 Parses an XML document from the input string that has already
57 undergone external-format decoding.
61 <b>Closing streams:</b> Sources can refer to Lisp streams that
62 need to be closed after parsing. This includes a stream passed
63 explicitly as <tt>input</tt>, a stream created implicitly for the
64 <tt>pathname</tt> case, as well as any streams created
65 automatically for external parsed entities referred to by the
69 All these stream get closed automatically if end of file is
70 reached normally. Use <tt>klacks:close-source</tt> or
71 <tt>klacks:with-open-source</tt> to ensure that the streams get
75 <b>Buffering:</b> By default, the Klacks parser performs buffering
76 of octets being read from the stream as an optimization. This can
77 result in unwanted blocking if the stream is a socket and the
78 parser tries to read more data than required to parse the current
79 event. Use <tt>:buffering nil</tt> to disable this optimization.
83 <tt>buffering</tt> -- Boolean, defaults to <tt>t</tt>. If
84 enabled, read data several kilobytes at time. If disabled,
85 read only single bytes at a time.
89 The following <b>keyword arguments</b> have the same meaning as
90 with the SAX parser, please refer to the documentation of <a
91 href="sax.html#parser">parse-file</a> for more information:
103 <tt>entity-resolver</tt>
106 <tt>disallow-internal-subset</tt>
110 In addition, the following argument is for types of <tt>input</tt>
111 other than <tt>pathname</tt>:
115 <tt>pathname</tt> -- If specified, defines the base URI of the
116 document based on this pathname instance.
121 Events are read from the stream using the following functions:
123 <div class="def">Function KLACKS:PEEK (source)</div>
124 <p> => :start-document<br/>
125 or => :start-document, version, encoding, standalonep<br/>
126 or => :dtd, name, public-id, system-id<br/>
127 or => :start-element, uri, lname, qname<br/>
128 or => :end-element, uri, lname, qname<br/>
129 or => :characters, data<br/>
130 or => :processing-instruction, target, data<br/>
131 or => :comment, data<br/>
132 or => :end-document, data<br/>
136 <tt>peek</tt> returns the current event's key and main values.
139 <div class="def">Function KLACKS:PEEK-NEXT (source) => key, value*</div>
142 Advance the source forward to the next event and returns it
143 like <tt>peek</tt> would.
146 <div class="def">Function KLACKS:PEEK-VALUE (source) => value*</div>
149 Like <tt>peek</tt>, but return only the values, not the key.
152 <div class="def">Function KLACKS:CONSUME (source) => key, value*</div>
155 Return the same values <tt>peek</tt> would, and in addition
156 advance the source forward to the next event.
159 <div class="def">Function KLACKS:CURRENT-URI (source) => uri</div>
160 <div class="def">Function KLACKS:CURRENT-LNAME (source) => string</div>
161 <div class="def">Function KLACKS:CURRENT-QNAME (source) => string</div>
164 If the current event is :start-element or :end-element, return the
165 corresponding value. Else, signal an error.
168 <div class="def">Function KLACKS:CURRENT-CHARACTERS (source) => string</div>
171 If the current event is :characters, return the character data
172 value. Else, signal an error.
175 <div class="def">Function KLACKS:CURRENT-CDATA-SECTION-P (source) => boolean</div>
178 If the current event is :characters, determine whether the data was
179 specified using a CDATA section in the source document. Else,
183 <div class="def">Function KLACKS:MAP-CURRENT-NAMESPACE-DECLARATIONS (fn source) => nil</div>
186 For use only on :start-element and :end-element events, this
187 function report every namespace declaration on the current element.
188 On :start-element, these correspond to the xmlns attributes of the
189 start tag. On :end-element, the declarations of the corresponding
190 start tag are reported. No inherited namespaces are
191 included. <tt>fn</tt> is called only for each declaration with two
192 arguments, the prefix and uri.
195 <div class="def">Function KLACKS:MAP-ATTRIBUTES (fn source)</div>
198 Call <tt>fn</tt> for each attribute of the current start tag in
199 turn, and pass the following values as arguments to the function:
201 <li>namespace uri</li>
203 <li>qualified name</li>
204 <li>attribute value</li>
205 <li>a boolean indicating whether the attribute was specified
206 explicitly in the source document, rather than defaulted from
209 Only valid for :start-element.
212 <div class="def">Function KLACKS:LIST-ATTRIBUTES (source)</div>
215 Return a list of SAX attribute structures for the current start tag.
216 Only valid for :start-element.
219 <div class="def">Function KLACKS:GET-ATTRIBUTE (source lname
220 &optional uri)</div>
223 Return a SAX attribute structures for the current start tag.
224 Only valid for :start-element.
228 <div class="def">Function KLACKS:CLOSE-SOURCE (source)</div>
229 Close all streams referred to by <tt>source</tt>.
232 <div class="def">Macro KLACKS:WITH-OPEN-SOURCE ((var source) &body body)</div>
233 Evaluate <tt>source</tt> to create a source object, bind it to
234 symbol <tt>var</tt> and evaluate <tt>body</tt> as an implicit progn.
235 Call <tt>klacks:close-source</tt> to close the source after
236 exiting <tt>body</tt>, whether normally or abnormally.
239 <a name="convenience"/>
240 <h3>Convenience functions</h3>
242 <div class="def">Function KLACKS:FIND-EVENT (source key)</div>
243 Read events from <tt>source</tt> and discard them until an event
244 of type <i>key</i> is found. Return values like <tt>peek</tt>, or
245 NIL if no such event was found.
248 <div class="def">Function KLACKS:FIND-ELEMENT (source &optional
250 Read events from <tt>source</tt> and discard them until an event
251 of type :start-element is found with matching local name and
252 namespace uri is found. If <tt>lname</tt> is <tt>nil</tt>, any
253 tag name matches. If <tt>uri</tt> is <tt>nil</tt>, any
254 namespace matches. Return values like <tt>peek</tt> or NIL if no
255 such event was found.
258 <div class="def">Condition KLACKS:KLACKS-ERROR (xml-parse-error)</div>
259 The condition class signalled by <tt>expect</tt>.
262 <div class="def">Function KLACKS:EXPECT (source key &optional
263 value1 value2 value3)</div>
264 Assert that the current event is equal to (key value1 value2
265 value3). (Ignore <i>value</i> arguments that are NIL.) If so,
266 return it as multiple values. Otherwise signal a
267 <tt>klacks-error</tt>.
270 <div class="def">Function KLACKS:SKIP (source key &optional
271 value1 value2 value3)</div>
272 <tt>expect</tt> the specific event, then <tt>consume</tt> it.
275 <div class="def">Macro KLACKS:EXPECTING-ELEMENT ((fn source
276 &optional lname uri) &body body</div>
277 Assert that the current event matches (:start-element uri lname).
278 (Ignore <i>value</i> arguments that are NIL) Otherwise signal a
279 <tt>klacks-error</tt>.
280 Evaluate <tt>body</tt> as an implicit progn. Finally assert that
281 the remaining event matches (:end-element uri lname).
285 <h3>Bridging Klacks and SAX</h3>
287 <div class="def">Function KLACKS:SERIALIZE-EVENT (source handler)</div>
288 Send the current klacks event from <tt>source</tt> as a SAX
289 event to the SAX <tt>handler</tt> and consume it.
292 <div class="def">Function KLACKS:SERIALIZE-ELEMENT (source handler
293 &key document-events)</div>
294 Read all klacks events from the following <tt>:start-element</tt> to
295 its <tt>:end-element</tt> and send them as SAX events
296 to <tt>handler</tt>. When this function is called, the current
297 event must be <tt>:start-element</tt>, else an error is
298 signalled. With <tt>document-events</tt> (the default),
299 <tt>sax:start-document</tt> and <tt>sax:end-document</tt> events
300 are sent around the element.
303 <div class="def">Function KLACKS:SERIALIZE-SOURCE (source handler)</div>
304 Read all klacks events from <tt>source</tt> and send them as SAX
305 events to the SAX <tt>handler</tt>.
308 <div class="def">Class KLACKS:TAPPING-SOURCE (source)</div>
309 A klacks source that relays events from an upstream klacks source
310 unchanged, while also emitting them as SAX events to a
311 user-specified handler at the same time.
314 <div class="def">Functon KLACKS:MAKE-TAPPING-SOURCE
315 (upstream-source &optional sax-handler)</div>
316 Create a tapping source relaying events
317 for <tt>upstream-source</tt>, and sending SAX events
318 to <tt>sax-handler</tt>.
322 <h3>Location information</h3>
324 <div class="def">Function KLACKS:CURRENT-LINE-NUMBER (source)</div>
325 Return an approximation of the current line number, or NIL.
328 <div class="def">Function KLACKS:CURRENT-COLUMN-NUMBER (source)</div>
329 Return an approximation of the current column number, or NIL.
332 <div class="def">Function KLACKS:CURRENT-SYSTEM-ID (source)</div>
333 Return the URI of the document being parsed. This is either the
334 main document, or the entity's system ID while contents of a parsed
335 general external entity are being processed.
338 <div class="def">Function KLACKS:CURRENT-XML-BASE (source)</div>
339 Return the [Base URI] of the current element. This URI can differ from
340 the value returned by <tt>current-system-id</tt> if xml:base
341 attributes are present.
347 The following example illustrates creation of a klacks <tt>source</tt>,
348 use of the <tt>peek-next</tt> function to read individual events,
349 and shows some of the most common event types.
351 <pre>* <b>(defparameter *source* (cxml:make-source "<example>text</example>"))</b>
354 * <b>(klacks:peek-next *source*)</b>
357 * <b>(klacks:peek-next *source*)</b>
360 "example" ;local name
361 "example" ;qualified name
363 * <b>(klacks:peek-next *source*)</b>
367 * <b>(klacks:peek-next *source*)</b>
373 * <b>(klacks:peek-next *source*)</b>
376 * <b>(klacks:peek-next *source*)</b>
380 In this example, <tt>find-element</tt> is used to skip over the
381 uninteresting events until the opening <tt>child1</tt> tag is
382 found. Then <tt>serialize-element</tt> is used to generate SAX
383 events for the following element, including its children, and an
384 xmls-compatible list structure is built from those
385 events. <tt>find-element</tt> skips over whitespace,
386 and <tt>find-event</tt> is used to parse up
387 to <tt>:end-document</tt>, ensuring that the source has been
390 <pre>* <b>(defparameter *source*
391 (cxml:make-source "<example>
392 <child1><p>foo</p></child1>
393 <child2 bar='baz'/>
397 * <b>(klacks:find-element *source* "child1")</b>
403 * <b>(klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))</b>
404 ("child1" NIL ("p" NIL "foo"))
406 * <b>(klacks:find-element *source*)</b>
412 * <b>(klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))</b>
413 ("child2" (("bar" "baz")))
415 * <b>(klacks:find-event *source* :end-document)</b>