[GENERIC] Zend_Translate:
[zend.git] / documentation / manual / en / module_specs / Zend_Dom-Query.xml
blobb97272cbe6a81d631365afc3b3fa6730f0b39a81
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!-- Reviewed: no -->
3 <sect1 id="zend.dom.query">
4     <title>Zend_Dom_Query</title>
6     <para>
7         <classname>Zend_Dom_Query</classname> provides mechanisms for querying
8         <acronym>XML</acronym> and (X)<acronym>HTML</acronym> documents utilizing either XPath or
9         <acronym>CSS</acronym> selectors. It was developed to aid with functional testing of
10         <acronym>MVC</acronym> applications, but could also be used for rapid development of screen
11         scrapers.
12     </para>
14     <para>
15         <acronym>CSS</acronym> selector notation is provided as a simpler and more familiar
16         notation for web developers to utilize when querying documents with <acronym>XML</acronym>
17         structures. The notation should be familiar to anybody who has developed
18         Cascading Style Sheets or who utilizes Javascript toolkits that provide
19         functionality for selecting nodes utilizing <acronym>CSS</acronym> selectors
20         (<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
21             $$()</ulink> and
22         <ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
23             dojo.query</ulink> were both inspirations for the component).
24     </para>
26     <sect2 id="zend.dom.query.operation">
27         <title>Theory of Operation</title>
29         <para>
30             To use <classname>Zend_Dom_Query</classname>, you instantiate a
31             <classname>Zend_Dom_Query</classname> object, optionally passing a document to
32             query (a string). Once you have a document, you can use either the
33             <methodname>query()</methodname> or <methodname>queryXpath()</methodname> methods; each
34             method will return a <classname>Zend_Dom_Query_Result</classname> object with
35             any matching nodes.
36         </para>
38         <para>
39             The primary difference between <classname>Zend_Dom_Query</classname> and using
40             DOMDocument + DOMXPath is the ability to select against <acronym>CSS</acronym>
41             selectors. You can utilize any of the following, in any combination:
42         </para>
44         <itemizedlist>
45             <listitem>
46                 <para>
47                     <emphasis>element types</emphasis>: provide an element type to
48                     match: 'div', 'a', 'span', 'h2', etc.
49                 </para>
50             </listitem>
52             <listitem>
53                 <para>
54                     <emphasis>style attributes</emphasis>: <acronym>CSS</acronym> style attributes
55                     to match: '<command>.error</command>', '<command>div.error</command>',
56                     '<command>label.required</command>', etc. If an
57                     element defines more than one style, this will match as long as
58                     the named style is present anywhere in the style declaration.
59                 </para>
60             </listitem>
62             <listitem>
63                 <para>
64                     <emphasis>id attributes</emphasis>: element ID attributes to
65                     match: '#content', 'div#nav', etc.
66                 </para>
67             </listitem>
69             <listitem>
70                 <para>
71                     <emphasis>arbitrary attributes</emphasis>: arbitrary element
72                     attributes to match. Three different types of matching are
73                     provided:
74                 </para>
76                 <itemizedlist>
77                     <listitem>
78                         <para>
79                             <emphasis>exact match</emphasis>: the attribute exactly
80                             matches the string: 'div[bar="baz"]' would match a div
81                             element with a "bar" attribute that exactly matches the
82                             value "baz".
83                         </para>
84                     </listitem>
86                     <listitem>
87                         <para>
88                             <emphasis>word match</emphasis>: the attribute contains
89                             a word matching the string: 'div[bar~="baz"]' would match a div
90                             element with a "bar" attribute that contains the
91                             word "baz". '&lt;div bar="foo baz"&gt;' would match, but '&lt;div
92                             bar="foo bazbat"&gt;' would not.
93                         </para>
94                     </listitem>
96                     <listitem>
97                         <para>
98                             <emphasis>substring match</emphasis>: the attribute contains
99                             the string: 'div[bar*="baz"]' would match a div
100                             element with a "bar" attribute that contains the
101                             string "baz" anywhere within it.
102                         </para>
103                     </listitem>
104                 </itemizedlist>
105             </listitem>
107             <listitem>
108                 <para>
109                     <emphasis>direct descendents</emphasis>: utilize '&gt;' between
110                     selectors to denote direct descendents. 'div > span' would
111                     select only 'span' elements that are direct descendents of a
112                     'div'. Can also be used with any of the selectors above.
113                 </para>
114             </listitem>
116             <listitem>
117                 <para>
118                     <emphasis>descendents</emphasis>: string together
119                     multiple selectors to indicate a hierarchy along which
120                     to search. '<command>div .foo span #one</command>' would select an element
121                     of id 'one' that is a descendent of arbitrary depth
122                     beneath a 'span' element, which is in turn a descendent
123                     of arbitrary depth beneath an element with a class of
124                     'foo', that is an descendent of arbitrary depth beneath
125                     a 'div' element. For example, it would match the link to
126                     the word 'One' in the listing below:
127                 </para>
129                 <programlisting language="html"><![CDATA[
130 <div>
131 <table>
132     <tr>
133         <td class="foo">
134             <div>
135                 Lorem ipsum <span class="bar">
136                     <a href="/foo/bar" id="one">One</a>
137                     <a href="/foo/baz" id="two">Two</a>
138                     <a href="/foo/bat" id="three">Three</a>
139                     <a href="/foo/bla" id="four">Four</a>
140                 </span>
141             </div>
142         </td>
143     </tr>
144 </table>
145 </div>
146 ]]></programlisting>
147             </listitem>
148         </itemizedlist>
150         <para>
151             Once you've performed your query, you can then work with the result
152             object to determine information about the nodes, as well as to pull
153             them and/or their content directly for examination and manipulation.
154             <classname>Zend_Dom_Query_Result</classname> implements <classname>Countable</classname>
155             and <classname>Iterator</classname>, and store the results internally as
156             DOMNodes and DOMElements. As an example, consider the following call,
157             that selects against the <acronym>HTML</acronym> above:
158         </para>
160         <programlisting language="php"><![CDATA[
161 $dom = new Zend_Dom_Query($html);
162 $results = $dom->query('.foo .bar a');
164 $count = count($results); // get number of matches: 4
165 foreach ($results as $result) {
166     // $result is a DOMElement
168 ]]></programlisting>
170         <para>
171             <classname>Zend_Dom_Query</classname> also allows straight XPath queries
172             utilizing the <methodname>queryXpath()</methodname> method; you can pass any
173             valid XPath query to this method, and it will return a
174             <classname>Zend_Dom_Query_Result</classname> object.
175         </para>
176     </sect2>
178     <sect2 id="zend.dom.query.methods">
179         <title>Methods Available</title>
181         <para>
182             The <classname>Zend_Dom_Query</classname> family of classes have the following
183             methods available.
184         </para>
186         <sect3 id="zend.dom.query.methods.zenddomquery">
187             <title>Zend_Dom_Query</title>
189             <para>
190                 The following methods are available to
191                 <classname>Zend_Dom_Query</classname>:
192             </para>
194             <itemizedlist>
195                 <listitem>
196                     <para>
197                         <methodname>setDocumentXml($document)</methodname>: specify an
198                         <acronym>XML</acronym> string to query against.
199                     </para>
200                 </listitem>
202                 <listitem>
203                     <para>
204                         <methodname>setDocumentXhtml($document)</methodname>: specify an
205                         <acronym>XHTML</acronym> string to query against.
206                     </para>
207                 </listitem>
209                 <listitem>
210                     <para>
211                         <methodname>setDocumentHtml($document)</methodname>: specify an
212                         <acronym>HTML</acronym> string to query against.
213                     </para>
214                 </listitem>
216                 <listitem>
217                     <para>
218                         <methodname>setDocument($document)</methodname>: specify a
219                         string to query against; <classname>Zend_Dom_Query</classname> will
220                         then attempt to autodetect the document type.
221                     </para>
222                 </listitem>
224                 <listitem>
225                     <para>
226                         <methodname>getDocument()</methodname>: retrieve the original document
227                         string provided to the object.
228                     </para>
229                 </listitem>
231                 <listitem>
232                     <para>
233                         <methodname>getDocumentType()</methodname>: retrieve the document
234                         type of the document provided to the object; will be one of
235                         the <constant>DOC_XML</constant>, <constant>DOC_XHTML</constant>, or
236                         <constant>DOC_HTML</constant> class constants.
237                     </para>
238                 </listitem>
240                 <listitem>
241                     <para>
242                         <methodname>query($query)</methodname>: query the document using
243                         <acronym>CSS</acronym> selector notation.
244                     </para>
245                 </listitem>
247                 <listitem>
248                     <para>
249                         <methodname>queryXpath($xPathQuery)</methodname>: query the document
250                         using XPath notation.
251                     </para>
252                 </listitem>
253             </itemizedlist>
254         </sect3>
256         <sect3 id="zend.dom.query.methods.zenddomqueryresult">
257             <title>Zend_Dom_Query_Result</title>
259             <para>
260                 As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
261                 implements both <classname>Iterator</classname> and
262                 <classname>Countable</classname>, and as such can be used in a
263                 <methodname>foreach()</methodname> loop as well as with the
264                 <methodname>count()</methodname> function. Additionally, it exposes the
265                 following methods:
266             </para>
268             <itemizedlist>
269                 <listitem>
270                     <para>
271                         <methodname>getCssQuery()</methodname>: return the <acronym>CSS</acronym>
272                         selector query used to produce the result (if any).
273                     </para>
274                 </listitem>
276                 <listitem>
277                     <para>
278                         <methodname>getXpathQuery()</methodname>: return the XPath query
279                         used to produce the result. Internally,
280                         <classname>Zend_Dom_Query</classname> converts <acronym>CSS</acronym>
281                         selector queries to XPath, so this value will always be populated.
282                     </para>
283                 </listitem>
285                 <listitem>
286                     <para>
287                         <methodname>getDocument()</methodname>: retrieve the DOMDocument the
288                         selection was made against.
289                     </para>
290                 </listitem>
291             </itemizedlist>
292         </sect3>
293     </sect2>
294 </sect1>
295 <!--
296 vim:se ts=4 sw=4 et: