The code to unlink dropped relations in FinishPreparedTransaction() was
[PostgreSQL.git] / doc / src / sgml / xml2.sgml
blob35076847a3045fce58e36927b3343f03c7cdca31
1 <!-- $PostgreSQL$ -->
3 <sect1 id="xml2">
4 <title>xml2</title>
6 <indexterm zone="xml2">
7 <primary>xml2</primary>
8 </indexterm>
10 <para>
11 The <filename>xml2</> module provides XPath querying and
12 XSLT functionality.
13 </para>
15 <sect2>
16 <title>Deprecation notice</title>
18 <para>
19 From <productname>PostgreSQL</> 8.3 on, there is XML-related
20 functionality based on the SQL/XML standard in the core server.
21 That functionality covers XML syntax checking and XPath queries,
22 which is what this module does, and more, but the API is
23 not at all compatible. It is planned that this module will be
24 removed in PostgreSQL 8.4 in favor of the newer standard API, so
25 you are encouraged to try converting your applications. If you
26 find that some of the functionality of this module is not
27 available in an adequate form with the newer API, please explain
28 your issue to pgsql-hackers@postgresql.org so that the deficiency
29 can be addressed.
30 </para>
31 </sect2>
33 <sect2>
34 <title>Description of functions</title>
36 <para>
37 <xref linkend="xml2-functions-table"> shows the functions provided by this module.
38 These functions provide straightforward XML parsing and XPath queries.
39 All arguments are of type <type>text</>, so for brevity that is not shown.
40 </para>
42 <table id="xml2-functions-table">
43 <title>Functions</title>
44 <tgroup cols="2">
45 <tbody>
46 <row>
47 <entry>
48 <synopsis>
49 xml_is_well_formed(document) returns bool
50 </synopsis>
51 </entry>
52 <entry>
53 <para>
54 This parses the document text in its parameter and returns true if the
55 document is well-formed XML. (Note: before PostgreSQL 8.2, this
56 function was called <function>xml_valid()</>. That is the wrong name
57 since validity and well-formedness have different meanings in XML.
58 The old name is still available, but is deprecated.)
59 </para>
60 </entry>
61 </row>
62 <row>
63 <entry>
64 <synopsis>
65 xpath_string(document,query) returns text
66 xpath_number(document,query) returns float4
67 xpath_bool(document,query) returns bool
68 </synopsis>
69 </entry>
70 <entry>
71 <para>
72 These functions evaluate the XPath query on the supplied document, and
73 cast the result to the specified type.
74 </para>
75 </entry>
76 </row>
77 <row>
78 <entry>
79 <synopsis>
80 xpath_nodeset(document,query,toptag,itemtag) returns text
81 </synopsis>
82 </entry>
83 <entry>
84 <para>
85 This evaluates query on document and wraps the result in XML tags. If
86 the result is multivalued, the output will look like:
87 </para>
88 <literal>
89 &lt;toptag&gt;
90 &lt;itemtag&gt;Value 1 which could be an XML fragment&lt;/itemtag&gt;
91 &lt;itemtag&gt;Value 2....&lt;/itemtag&gt;
92 &lt;/toptag&gt;
93 </literal>
94 <para>
95 If either toptag or itemtag is an empty string, the relevant tag is omitted.
96 </para>
97 </entry>
98 </row>
99 <row>
100 <entry>
101 <synopsis>
102 xpath_nodeset(document,query) returns text
103 </synopsis>
104 </entry>
105 <entry>
106 <para>
107 Like xpath_nodeset(document,query,toptag,itemtag) but result omits both tags.
108 </para>
109 </entry>
110 </row>
111 <row>
112 <entry>
113 <synopsis>
114 xpath_nodeset(document,query,itemtag) returns text
115 </synopsis>
116 </entry>
117 <entry>
118 <para>
119 Like xpath_nodeset(document,query,toptag,itemtag) but result omits toptag.
120 </para>
121 </entry>
122 </row>
123 <row>
124 <entry>
125 <synopsis>
126 xpath_list(document,query,separator) returns text
127 </synopsis>
128 </entry>
129 <entry>
130 <para>
131 This function returns multiple values separated by the specified
132 separator, for example <literal>Value 1,Value 2,Value 3</> if
133 separator is <literal>,</>.
134 </para>
135 </entry>
136 </row>
137 <row>
138 <entry>
139 <synopsis>
140 xpath_list(document,query) returns text
141 </synopsis>
142 </entry>
143 <entry>
144 This is a wrapper for the above function that uses <literal>,</>
145 as the separator.
146 </entry>
147 </row>
148 </tbody>
149 </tgroup>
150 </table>
151 </sect2>
153 <sect2>
154 <title><literal>xpath_table</literal></title>
156 <synopsis>
157 xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record
158 </synopsis>
160 <para>
161 <function>xpath_table</> is a table function that evaluates a set of XPath
162 queries on each of a set of documents and returns the results as a
163 table. The primary key field from the original document table is returned
164 as the first column of the result so that the result set
165 can readily be used in joins.
166 </para>
168 <table>
169 <title>Parameters</title>
170 <tgroup cols="2">
171 <tbody>
172 <row>
173 <entry><parameter>key</parameter></entry>
174 <entry>
175 <para>
176 the name of the <quote>key</> field &mdash; this is just a field to be used as
177 the first column of the output table, i.e., it identifies the record from
178 which each output row came (see note below about multiple values)
179 </para>
180 </entry>
181 </row>
182 <row>
183 <entry><parameter>document</parameter></entry>
184 <entry>
185 <para>
186 the name of the field containing the XML document
187 </para>
188 </entry>
189 </row>
190 <row>
191 <entry><parameter>relation</parameter></entry>
192 <entry>
193 <para>
194 the name of the table or view containing the documents
195 </para>
196 </entry>
197 </row>
198 <row>
199 <entry><parameter>xpaths</parameter></entry>
200 <entry>
201 <para>
202 one or more XPath expressions, separated by <literal>|</literal>
203 </para>
204 </entry>
205 </row>
206 <row>
207 <entry><parameter>criteria</parameter></entry>
208 <entry>
209 <para>
210 the contents of the WHERE clause. This cannot be omitted, so use
211 <literal>true</literal> or <literal>1=1</literal> if you want to
212 process all the rows in the relation
213 </para>
214 </entry>
215 </row>
216 </tbody>
217 </tgroup>
218 </table>
220 <para>
221 These parameters (except the XPath strings) are just substituted
222 into a plain SQL SELECT statement, so you have some flexibility &mdash; the
223 statement is
224 </para>
226 <para>
227 <literal>
228 SELECT &lt;key&gt;, &lt;document&gt; FROM &lt;relation&gt; WHERE &lt;criteria&gt;
229 </literal>
230 </para>
232 <para>
233 so those parameters can be <emphasis>anything</> valid in those particular
234 locations. The result from this SELECT needs to return exactly two
235 columns (which it will unless you try to list multiple fields for key
236 or document). Beware that this simplistic approach requires that you
237 validate any user-supplied values to avoid SQL injection attacks.
238 </para>
240 <para>
241 The function has to be used in a <literal>FROM</> expression, with an
242 <literal>AS</> clause to specify the output columns; for example
243 </para>
245 <programlisting>
246 SELECT * FROM
247 xpath_table('article_id',
248 'article_xml',
249 'articles',
250 '/article/author|/article/pages|/article/title',
251 'date_entered > ''2003-01-01'' ')
252 AS t(article_id integer, author text, page_count integer, title text);
253 </programlisting>
255 <para>
256 The <literal>AS</> clause defines the names and types of the columns in the
257 output table. The first is the <quote>key</> field and the rest correspond
258 to the XPath queries.
259 If there are more XPath queries than result columns,
260 the extra queries will be ignored. If there are more result columns
261 than XPath queries, the extra columns will be NULL.
262 </para>
264 <para>
265 Notice that this example defines the <structname>page_count</> result
266 column as an integer. The function deals internally with string
267 representations, so when you say you want an integer in the output, it will
268 take the string representation of the XPath result and use PostgreSQL input
269 functions to transform it into an integer (or whatever type the <type>AS</>
270 clause requests). An error will result if it can't do this &mdash; for
271 example if the result is empty &mdash; so you may wish to just stick to
272 <type>text</> as the column type if you think your data has any problems.
273 </para>
275 <para>
276 The calling <command>SELECT</> statement doesn't necessarily have be
277 be just <literal>SELECT *</> &mdash; it can reference the output
278 columns by name or join them to other tables. The function produces a
279 virtual table with which you can perform any operation you wish (e.g.
280 aggregation, joining, sorting etc). So we could also have:
281 </para>
283 <programlisting>
284 SELECT t.title, p.fullname, p.email
285 FROM xpath_table('article_id', 'article_xml', 'articles',
286 '/article/title|/article/author/@id',
287 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
288 AS t(article_id integer, title text, author_id integer),
289 tblPeopleInfo AS p
290 WHERE t.author_id = p.person_id;
291 </programlisting>
293 <para>
294 as a more complicated example. Of course, you could wrap all
295 of this in a view for convenience.
296 </para>
298 <sect3>
299 <title>Multivalued results</title>
301 <para>
302 The <function>xpath_table</> function assumes that the results of each XPath query
303 might be multi-valued, so the number of rows returned by the function
304 may not be the same as the number of input documents. The first row
305 returned contains the first result from each query, the second row the
306 second result from each query. If one of the queries has fewer values
307 than the others, NULLs will be returned instead.
308 </para>
310 <para>
311 In some cases, a user will know that a given XPath query will return
312 only a single result (perhaps a unique document identifier) &mdash; if used
313 alongside an XPath query returning multiple results, the single-valued
314 result will appear only on the first row of the result. The solution
315 to this is to use the key field as part of a join against a simpler
316 XPath query. As an example:
317 </para>
319 <programlisting>
320 CREATE TABLE test (
321 id int4 NOT NULL,
322 xml text,
323 CONSTRAINT pk PRIMARY KEY (id)
326 INSERT INTO test VALUES (1, '&lt;doc num="C1"&gt;
327 &lt;line num="L1"&gt;&lt;a&gt;1&lt;/a&gt;&lt;b&gt;2&lt;/b&gt;&lt;c&gt;3&lt;/c&gt;&lt;/line&gt;
328 &lt;line num="L2"&gt;&lt;a&gt;11&lt;/a&gt;&lt;b&gt;22&lt;/b&gt;&lt;c&gt;33&lt;/c&gt;&lt;/line&gt;
329 &lt;/doc&gt;');
331 INSERT INTO test VALUES (2, '&lt;doc num="C2"&gt;
332 &lt;line num="L1"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
333 &lt;line num="L2"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
334 &lt;/doc&gt;');
336 SELECT * FROM
337 xpath_table('id','xml','test',
338 '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
339 'true')
340 AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4, val2 int4, val3 int4)
341 WHERE id = 1 ORDER BY doc_num, line_num
343 id | doc_num | line_num | val1 | val2 | val3
344 ----+---------+----------+------+------+------
345 1 | C1 | L1 | 1 | 2 | 3
346 1 | | L2 | 11 | 22 | 33
347 </programlisting>
349 <para>
350 To get doc_num on every line, the solution is to use two invocations
351 of xpath_table and join the results:
352 </para>
354 <programlisting>
355 SELECT t.*,i.doc_num FROM
356 xpath_table('id', 'xml', 'test',
357 '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
358 'true')
359 AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
360 xpath_table('id', 'xml', 'test', '/doc/@num', 'true')
361 AS i(id int4, doc_num varchar(10))
362 WHERE i.id=t.id AND i.id=1
363 ORDER BY doc_num, line_num;
365 id | line_num | val1 | val2 | val3 | doc_num
366 ----+----------+------+------+------+---------
367 1 | L1 | 1 | 2 | 3 | C1
368 1 | L2 | 11 | 22 | 33 | C1
369 (2 rows)
370 </programlisting>
371 </sect3>
372 </sect2>
374 <sect2>
375 <title>XSLT functions</title>
377 <para>
378 The following functions are available if libxslt is installed:
379 </para>
381 <sect3>
382 <title><literal>xslt_process</literal></title>
384 <synopsis>
385 xslt_process(text document, text stylesheet, text paramlist) returns text
386 </synopsis>
388 <para>
389 This function appplies the XSL stylesheet to the document and returns
390 the transformed result. The paramlist is a list of parameter
391 assignments to be used in the transformation, specified in the form
392 <literal>a=1,b=2</>. Note that the
393 parameter parsing is very simple-minded: parameter values cannot
394 contain commas!
395 </para>
397 <para>
398 Also note that if either the document or stylesheet values do not
399 begin with a &lt; then they will be treated as URLs and libxslt will
400 fetch them. It follows that you can use <function>xslt_process</> as a
401 means to fetch the contents of URLs &mdash; you should be aware of the
402 security implications of this.
403 </para>
405 <para>
406 There is also a two-parameter version of <function>xslt_process</> which
407 does not pass any parameters to the transformation.
408 </para>
409 </sect3>
410 </sect2>
412 <sect2>
413 <title>Author</title>
415 <para>
416 John Gray <email>jgray@azuli.co.uk</email>
417 </para>
419 <para>
420 Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com).
421 It has the same BSD licence as PostgreSQL.
422 </para>
423 </sect2>
425 </sect1>