Repair memory leaks in plpython.
[pgsql.git] / doc / src / sgml / xml2.sgml
blob9fd613f9675f36669744f1da34456ff351d5cfe1
1 <!-- doc/src/sgml/xml2.sgml -->
3 <sect1 id="xml2" xreflabel="xml2">
4 <title>xml2 &mdash; XPath querying and XSLT functionality</title>
6 <indexterm zone="xml2">
7 <primary>xml2</primary>
8 </indexterm>
10 <para>
11 The <filename>xml2</filename> module provides XPath querying and
12 XSLT functionality.
13 </para>
15 <sect2 id="xml2-deprecation">
16 <title>Deprecation Notice</title>
18 <para>
19 From <productname>PostgreSQL</productname> 8.3 on, there is XML-related
20 functionality based on the SQL/XML standard in the core server.
21 That functionality covers XML syntax checking and XPath queries,
22 which is what this module does, and more, but the API is
23 not at all compatible. It is planned that this module will be
24 removed in a future version of PostgreSQL in favor of the newer standard API, so
25 you are encouraged to try converting your applications. If you
26 find that some of the functionality of this module is not
27 available in an adequate form with the newer API, please explain
28 your issue to <email>pgsql-hackers@lists.postgresql.org</email> so that the deficiency
29 can be addressed.
30 </para>
31 </sect2>
33 <sect2 id="xml2-functions">
34 <title>Description of Functions</title>
36 <para>
37 <xref linkend="xml2-functions-table"/> shows the functions provided by this module.
38 These functions provide straightforward XML parsing and XPath queries.
39 </para>
41 <table id="xml2-functions-table">
42 <title><filename>xml2</filename> Functions</title>
43 <tgroup cols="1">
44 <thead>
45 <row>
46 <entry role="func_table_entry"><para role="func_signature">
47 Function
48 </para>
49 <para>
50 Description
51 </para></entry>
52 </row>
53 </thead>
55 <tbody>
56 <row>
57 <entry role="func_table_entry"><para role="func_signature">
58 <function>xml_valid</function> ( <parameter>document</parameter> <type>text</type> )
59 <returnvalue>boolean</returnvalue>
60 </para>
61 <para>
62 Parses the given document and returns true if the
63 document is well-formed XML. (Note: this is an alias for the standard
64 PostgreSQL function <function>xml_is_well_formed()</function>. The
65 name <function>xml_valid()</function> is technically incorrect since validity
66 and well-formedness have different meanings in XML.)
67 </para></entry>
68 </row>
70 <row>
71 <entry role="func_table_entry"><para role="func_signature">
72 <function>xpath_string</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> )
73 <returnvalue>text</returnvalue>
74 </para>
75 <para>
76 Evaluates the XPath query on the supplied document, and
77 casts the result to <type>text</type>.
78 </para></entry>
79 </row>
81 <row>
82 <entry role="func_table_entry"><para role="func_signature">
83 <function>xpath_number</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> )
84 <returnvalue>real</returnvalue>
85 </para>
86 <para>
87 Evaluates the XPath query on the supplied document, and
88 casts the result to <type>real</type>.
89 </para></entry>
90 </row>
92 <row>
93 <entry role="func_table_entry"><para role="func_signature">
94 <function>xpath_bool</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> )
95 <returnvalue>boolean</returnvalue>
96 </para>
97 <para>
98 Evaluates the XPath query on the supplied document, and
99 casts the result to <type>boolean</type>.
100 </para></entry>
101 </row>
103 <row>
104 <entry role="func_table_entry"><para role="func_signature">
105 <function>xpath_nodeset</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type>, <parameter>toptag</parameter> <type>text</type>, <parameter>itemtag</parameter> <type>text</type> )
106 <returnvalue>text</returnvalue>
107 </para>
108 <para>
109 Evaluates the query on the document and wraps the result in XML
110 tags. If the result is multivalued, the output will look like:
111 <synopsis>
112 &lt;toptag&gt;
113 &lt;itemtag&gt;Value 1 which could be an XML fragment&lt;/itemtag&gt;
114 &lt;itemtag&gt;Value 2....&lt;/itemtag&gt;
115 &lt;/toptag&gt;
116 </synopsis>
117 If either <parameter>toptag</parameter>
118 or <parameter>itemtag</parameter> is an empty string, the relevant tag
119 is omitted.
120 </para></entry>
121 </row>
123 <row>
124 <entry role="func_table_entry"><para role="func_signature">
125 <function>xpath_nodeset</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type>, <parameter>itemtag</parameter> <type>text</type> )
126 <returnvalue>text</returnvalue>
127 </para>
128 <para>
129 Like <function>xpath_nodeset(document, query, toptag, itemtag)</function> but result omits <parameter>toptag</parameter>.
130 </para></entry>
131 </row>
133 <row>
134 <entry role="func_table_entry"><para role="func_signature">
135 <function>xpath_nodeset</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> )
136 <returnvalue>text</returnvalue>
137 </para>
138 <para>
139 Like <function>xpath_nodeset(document, query, toptag, itemtag)</function> but result omits both tags.
140 </para></entry>
141 </row>
143 <row>
144 <entry role="func_table_entry"><para role="func_signature">
145 <function>xpath_list</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type>, <parameter>separator</parameter> <type>text</type> )
146 <returnvalue>text</returnvalue>
147 </para>
148 <para>
149 Evaluates the query on the document and returns multiple values
150 separated by the specified separator, for example <literal>Value
151 1,Value 2,Value 3</literal> if <parameter>separator</parameter>
152 is <literal>,</literal>.
153 </para></entry>
154 </row>
156 <row>
157 <entry role="func_table_entry"><para role="func_signature">
158 <function>xpath_list</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> )
159 <returnvalue>text</returnvalue>
160 </para>
161 <para>
162 This is a wrapper for the above function that uses <literal>,</literal>
163 as the separator.
164 </para></entry>
165 </row>
166 </tbody>
167 </tgroup>
168 </table>
169 </sect2>
171 <sect2 id="xml2-xpath-table">
172 <title><literal>xpath_table</literal></title>
174 <indexterm>
175 <primary>xpath_table</primary>
176 </indexterm>
178 <synopsis>
179 xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record
180 </synopsis>
182 <para>
183 <function>xpath_table</function> is a table function that evaluates a set of XPath
184 queries on each of a set of documents and returns the results as a
185 table. The primary key field from the original document table is returned
186 as the first column of the result so that the result set
187 can readily be used in joins. The parameters are described in
188 <xref linkend="xml2-xpath-table-parameters"/>.
189 </para>
191 <table id="xml2-xpath-table-parameters">
192 <title><function>xpath_table</function> Parameters</title>
193 <tgroup cols="2">
194 <colspec colname="col1" colwidth="1*"/>
195 <colspec colname="col2" colwidth="2*"/>
196 <thead>
197 <row>
198 <entry>Parameter</entry>
199 <entry>Description</entry>
200 </row>
201 </thead>
202 <tbody>
203 <row>
204 <entry><parameter>key</parameter></entry>
205 <entry>
206 <para>
207 the name of the <quote>key</quote> field &mdash; this is just a field to be used as
208 the first column of the output table, i.e., it identifies the record from
209 which each output row came (see note below about multiple values)
210 </para>
211 </entry>
212 </row>
213 <row>
214 <entry><parameter>document</parameter></entry>
215 <entry>
216 <para>
217 the name of the field containing the XML document
218 </para>
219 </entry>
220 </row>
221 <row>
222 <entry><parameter>relation</parameter></entry>
223 <entry>
224 <para>
225 the name of the table or view containing the documents
226 </para>
227 </entry>
228 </row>
229 <row>
230 <entry><parameter>xpaths</parameter></entry>
231 <entry>
232 <para>
233 one or more XPath expressions, separated by <literal>|</literal>
234 </para>
235 </entry>
236 </row>
237 <row>
238 <entry><parameter>criteria</parameter></entry>
239 <entry>
240 <para>
241 the contents of the WHERE clause. This cannot be omitted, so use
242 <literal>true</literal> or <literal>1=1</literal> if you want to
243 process all the rows in the relation
244 </para>
245 </entry>
246 </row>
247 </tbody>
248 </tgroup>
249 </table>
251 <para>
252 These parameters (except the XPath strings) are just substituted
253 into a plain SQL SELECT statement, so you have some flexibility &mdash; the
254 statement is
255 </para>
257 <para>
258 <literal>
259 SELECT &lt;key&gt;, &lt;document&gt; FROM &lt;relation&gt; WHERE &lt;criteria&gt;
260 </literal>
261 </para>
263 <para>
264 so those parameters can be <emphasis>anything</emphasis> valid in those particular
265 locations. The result from this SELECT needs to return exactly two
266 columns (which it will unless you try to list multiple fields for key
267 or document). Beware that this simplistic approach requires that you
268 validate any user-supplied values to avoid SQL injection attacks.
269 </para>
271 <para>
272 The function has to be used in a <literal>FROM</literal> expression, with an
273 <literal>AS</literal> clause to specify the output columns; for example
274 <programlisting>
275 SELECT * FROM
276 xpath_table('article_id',
277 'article_xml',
278 'articles',
279 '/article/author|/article/pages|/article/title',
280 'date_entered > ''2003-01-01'' ')
281 AS t(article_id integer, author text, page_count integer, title text);
282 </programlisting>
283 The <literal>AS</literal> clause defines the names and types of the columns in the
284 output table. The first is the <quote>key</quote> field and the rest correspond
285 to the XPath queries.
286 If there are more XPath queries than result columns,
287 the extra queries will be ignored. If there are more result columns
288 than XPath queries, the extra columns will be NULL.
289 </para>
291 <para>
292 Notice that this example defines the <structname>page_count</structname> result
293 column as an integer. The function deals internally with string
294 representations, so when you say you want an integer in the output, it will
295 take the string representation of the XPath result and use PostgreSQL input
296 functions to transform it into an integer (or whatever type the <type>AS</type>
297 clause requests). An error will result if it can't do this &mdash; for
298 example if the result is empty &mdash; so you may wish to just stick to
299 <type>text</type> as the column type if you think your data has any problems.
300 </para>
302 <para>
303 The calling <command>SELECT</command> statement doesn't necessarily have to be
304 just <literal>SELECT *</literal> &mdash; it can reference the output
305 columns by name or join them to other tables. The function produces a
306 virtual table with which you can perform any operation you wish (e.g.,
307 aggregation, joining, sorting etc.). So we could also have:
308 <programlisting>
309 SELECT t.title, p.fullname, p.email
310 FROM xpath_table('article_id', 'article_xml', 'articles',
311 '/article/title|/article/author/@id',
312 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
313 AS t(article_id integer, title text, author_id integer),
314 tblPeopleInfo AS p
315 WHERE t.author_id = p.person_id;
316 </programlisting>
317 as a more complicated example. Of course, you could wrap all
318 of this in a view for convenience.
319 </para>
321 <sect3 id="xml2-xpath-table-multivalued-results">
322 <title>Multivalued Results</title>
324 <para>
325 The <function>xpath_table</function> function assumes that the results of each XPath query
326 might be multivalued, so the number of rows returned by the function
327 may not be the same as the number of input documents. The first row
328 returned contains the first result from each query, the second row the
329 second result from each query. If one of the queries has fewer values
330 than the others, null values will be returned instead.
331 </para>
333 <para>
334 In some cases, a user will know that a given XPath query will return
335 only a single result (perhaps a unique document identifier) &mdash; if used
336 alongside an XPath query returning multiple results, the single-valued
337 result will appear only on the first row of the result. The solution
338 to this is to use the key field as part of a join against a simpler
339 XPath query. As an example:
341 <programlisting>
342 CREATE TABLE test (
343 id int PRIMARY KEY,
344 xml text
347 INSERT INTO test VALUES (1, '&lt;doc num="C1"&gt;
348 &lt;line num="L1"&gt;&lt;a&gt;1&lt;/a&gt;&lt;b&gt;2&lt;/b&gt;&lt;c&gt;3&lt;/c&gt;&lt;/line&gt;
349 &lt;line num="L2"&gt;&lt;a&gt;11&lt;/a&gt;&lt;b&gt;22&lt;/b&gt;&lt;c&gt;33&lt;/c&gt;&lt;/line&gt;
350 &lt;/doc&gt;');
352 INSERT INTO test VALUES (2, '&lt;doc num="C2"&gt;
353 &lt;line num="L1"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
354 &lt;line num="L2"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
355 &lt;/doc&gt;');
357 SELECT * FROM
358 xpath_table('id','xml','test',
359 '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
360 'true')
361 AS t(id int, doc_num varchar(10), line_num varchar(10), val1 int, val2 int, val3 int)
362 WHERE id = 1 ORDER BY doc_num, line_num
364 id | doc_num | line_num | val1 | val2 | val3
365 ----+---------+----------+------+------+------
366 1 | C1 | L1 | 1 | 2 | 3
367 1 | | L2 | 11 | 22 | 33
368 </programlisting>
369 </para>
371 <para>
372 To get <literal>doc_num</literal> on every line, the solution is to use two invocations
373 of <function>xpath_table</function> and join the results:
375 <programlisting>
376 SELECT t.*,i.doc_num FROM
377 xpath_table('id', 'xml', 'test',
378 '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
379 'true')
380 AS t(id int, line_num varchar(10), val1 int, val2 int, val3 int),
381 xpath_table('id', 'xml', 'test', '/doc/@num', 'true')
382 AS i(id int, doc_num varchar(10))
383 WHERE i.id=t.id AND i.id=1
384 ORDER BY doc_num, line_num;
386 id | line_num | val1 | val2 | val3 | doc_num
387 ----+----------+------+------+------+---------
388 1 | L1 | 1 | 2 | 3 | C1
389 1 | L2 | 11 | 22 | 33 | C1
390 (2 rows)
391 </programlisting>
392 </para>
393 </sect3>
394 </sect2>
396 <sect2 id="xml2-xslt">
397 <title>XSLT Functions</title>
399 <para>
400 The following functions are available if libxslt is installed:
401 </para>
403 <sect3 id="xml2-xslt-xslt-process">
404 <title><literal>xslt_process</literal></title>
406 <indexterm>
407 <primary>xslt_process</primary>
408 </indexterm>
410 <synopsis>
411 xslt_process(text document, text stylesheet, text paramlist) returns text
412 </synopsis>
414 <para>
415 This function applies the XSL stylesheet to the document and returns
416 the transformed result. The <literal>paramlist</literal> is a list of parameter
417 assignments to be used in the transformation, specified in the form
418 <literal>a=1,b=2</literal>. Note that the
419 parameter parsing is very simple-minded: parameter values cannot
420 contain commas!
421 </para>
423 <para>
424 There is also a two-parameter version of <function>xslt_process</function> which
425 does not pass any parameters to the transformation.
426 </para>
427 </sect3>
428 </sect2>
430 <sect2 id="xml2-author">
431 <title>Author</title>
433 <para>
434 John Gray <email>jgray@azuli.co.uk</email>
435 </para>
437 <para>
438 Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com).
439 It has the same BSD license as PostgreSQL.
440 </para>
441 </sect2>
443 </sect1>