1 <?xml version="1.0" encoding="UTF-8"?>
3 <sect1 id="zend.dom.query">
4 <title>Zend_Dom_Query</title>
7 <classname>Zend_Dom_Query</classname> provides mechanisms for querying
8 <acronym>XML</acronym> and (X)<acronym>HTML</acronym> documents utilizing either XPath or
9 <acronym>CSS</acronym> selectors. It was developed to aid with functional testing of
10 <acronym>MVC</acronym> applications, but could also be used for rapid development of screen
15 <acronym>CSS</acronym> selector notation is provided as a simpler and more familiar
16 notation for web developers to utilize when querying documents with <acronym>XML</acronym>
17 structures. The notation should be familiar to anybody who has developed
18 Cascading Style Sheets or who utilizes Javascript toolkits that provide
19 functionality for selecting nodes utilizing <acronym>CSS</acronym> selectors
20 (<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
22 <ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
23 dojo.query</ulink> were both inspirations for the component).
26 <sect2 id="zend.dom.query.operation">
27 <title>Theory of Operation</title>
30 To use <classname>Zend_Dom_Query</classname>, you instantiate a
31 <classname>Zend_Dom_Query</classname> object, optionally passing a document to
32 query (a string). Once you have a document, you can use either the
33 <methodname>query()</methodname> or <methodname>queryXpath()</methodname> methods; each
34 method will return a <classname>Zend_Dom_Query_Result</classname> object with
39 The primary difference between <classname>Zend_Dom_Query</classname> and using
40 DOMDocument + DOMXPath is the ability to select against <acronym>CSS</acronym>
41 selectors. You can utilize any of the following, in any combination:
47 <emphasis>element types</emphasis>: provide an element type to
48 match: 'div', 'a', 'span', 'h2', etc.
54 <emphasis>style attributes</emphasis>: <acronym>CSS</acronym> style attributes
55 to match: '<command>.error</command>', '<command>div.error</command>',
56 '<command>label.required</command>', etc. If an
57 element defines more than one style, this will match as long as
58 the named style is present anywhere in the style declaration.
64 <emphasis>id attributes</emphasis>: element ID attributes to
65 match: '#content', 'div#nav', etc.
71 <emphasis>arbitrary attributes</emphasis>: arbitrary element
72 attributes to match. Three different types of matching are
79 <emphasis>exact match</emphasis>: the attribute exactly
80 matches the string: 'div[bar="baz"]' would match a div
81 element with a "bar" attribute that exactly matches the
88 <emphasis>word match</emphasis>: the attribute contains
89 a word matching the string: 'div[bar~="baz"]' would match a div
90 element with a "bar" attribute that contains the
91 word "baz". '<div bar="foo baz">' would match, but '<div
92 bar="foo bazbat">' would not.
98 <emphasis>substring match</emphasis>: the attribute contains
99 the string: 'div[bar*="baz"]' would match a div
100 element with a "bar" attribute that contains the
101 string "baz" anywhere within it.
109 <emphasis>direct descendents</emphasis>: utilize '>' between
110 selectors to denote direct descendents. 'div > span' would
111 select only 'span' elements that are direct descendents of a
112 'div'. Can also be used with any of the selectors above.
118 <emphasis>descendents</emphasis>: string together
119 multiple selectors to indicate a hierarchy along which
120 to search. '<command>div .foo span #one</command>' would select an element
121 of id 'one' that is a descendent of arbitrary depth
122 beneath a 'span' element, which is in turn a descendent
123 of arbitrary depth beneath an element with a class of
124 'foo', that is an descendent of arbitrary depth beneath
125 a 'div' element. For example, it would match the link to
126 the word 'One' in the listing below:
129 <programlisting language="html"><![CDATA[
135 Lorem ipsum <span class="bar">
136 <a href="/foo/bar" id="one">One</a>
137 <a href="/foo/baz" id="two">Two</a>
138 <a href="/foo/bat" id="three">Three</a>
139 <a href="/foo/bla" id="four">Four</a>
151 Once you've performed your query, you can then work with the result
152 object to determine information about the nodes, as well as to pull
153 them and/or their content directly for examination and manipulation.
154 <classname>Zend_Dom_Query_Result</classname> implements <classname>Countable</classname>
155 and <classname>Iterator</classname>, and store the results internally as
156 DOMNodes and DOMElements. As an example, consider the following call,
157 that selects against the <acronym>HTML</acronym> above:
160 <programlisting language="php"><![CDATA[
161 $dom = new Zend_Dom_Query($html);
162 $results = $dom->query('.foo .bar a');
164 $count = count($results); // get number of matches: 4
165 foreach ($results as $result) {
166 // $result is a DOMElement
171 <classname>Zend_Dom_Query</classname> also allows straight XPath queries
172 utilizing the <methodname>queryXpath()</methodname> method; you can pass any
173 valid XPath query to this method, and it will return a
174 <classname>Zend_Dom_Query_Result</classname> object.
178 <sect2 id="zend.dom.query.methods">
179 <title>Methods Available</title>
182 The <classname>Zend_Dom_Query</classname> family of classes have the following
186 <sect3 id="zend.dom.query.methods.zenddomquery">
187 <title>Zend_Dom_Query</title>
190 The following methods are available to
191 <classname>Zend_Dom_Query</classname>:
197 <methodname>setDocumentXml($document)</methodname>: specify an
198 <acronym>XML</acronym> string to query against.
204 <methodname>setDocumentXhtml($document)</methodname>: specify an
205 <acronym>XHTML</acronym> string to query against.
211 <methodname>setDocumentHtml($document)</methodname>: specify an
212 <acronym>HTML</acronym> string to query against.
218 <methodname>setDocument($document)</methodname>: specify a
219 string to query against; <classname>Zend_Dom_Query</classname> will
220 then attempt to autodetect the document type.
226 <methodname>getDocument()</methodname>: retrieve the original document
227 string provided to the object.
233 <methodname>getDocumentType()</methodname>: retrieve the document
234 type of the document provided to the object; will be one of
235 the <constant>DOC_XML</constant>, <constant>DOC_XHTML</constant>, or
236 <constant>DOC_HTML</constant> class constants.
242 <methodname>query($query)</methodname>: query the document using
243 <acronym>CSS</acronym> selector notation.
249 <methodname>queryXpath($xPathQuery)</methodname>: query the document
250 using XPath notation.
256 <sect3 id="zend.dom.query.methods.zenddomqueryresult">
257 <title>Zend_Dom_Query_Result</title>
260 As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
261 implements both <classname>Iterator</classname> and
262 <classname>Countable</classname>, and as such can be used in a
263 <methodname>foreach()</methodname> loop as well as with the
264 <methodname>count()</methodname> function. Additionally, it exposes the
271 <methodname>getCssQuery()</methodname>: return the <acronym>CSS</acronym>
272 selector query used to produce the result (if any).
278 <methodname>getXpathQuery()</methodname>: return the XPath query
279 used to produce the result. Internally,
280 <classname>Zend_Dom_Query</classname> converts <acronym>CSS</acronym>
281 selector queries to XPath, so this value will always be populated.
287 <methodname>getDocument()</methodname>: retrieve the DOMDocument the
288 selection was made against.