Repair memory leaks in plpython.
[pgsql.git] / doc / src / sgml / seg.sgml
blobdc66e24f2f5144bf32602c4c321b3777a62780d5
1 <!-- doc/src/sgml/seg.sgml -->
3 <sect1 id="seg" xreflabel="seg">
4 <title>seg &mdash; a datatype for line segments or floating point intervals</title>
6 <indexterm zone="seg">
7 <primary>seg</primary>
8 </indexterm>
10 <para>
11 This module implements a data type <type>seg</type> for
12 representing line segments, or floating point intervals.
13 <type>seg</type> can represent uncertainty in the interval endpoints,
14 making it especially useful for representing laboratory measurements.
15 </para>
17 <para>
18 This module is considered <quote>trusted</quote>, that is, it can be
19 installed by non-superusers who have <literal>CREATE</literal> privilege
20 on the current database.
21 </para>
23 <sect2 id="seg-rationale">
24 <title>Rationale</title>
26 <para>
27 The geometry of measurements is usually more complex than that of a
28 point in a numeric continuum. A measurement is usually a segment of
29 that continuum with somewhat fuzzy limits. The measurements come out
30 as intervals because of uncertainty and randomness, as well as because
31 the value being measured may naturally be an interval indicating some
32 condition, such as the temperature range of stability of a protein.
33 </para>
35 <para>
36 Using just common sense, it appears more convenient to store such data
37 as intervals, rather than pairs of numbers. In practice, it even turns
38 out more efficient in most applications.
39 </para>
41 <para>
42 Further along the line of common sense, the fuzziness of the limits
43 suggests that the use of traditional numeric data types leads to a
44 certain loss of information. Consider this: your instrument reads
45 6.50, and you input this reading into the database. What do you get
46 when you fetch it? Watch:
48 <screen>
49 test=&gt; select 6.50 :: float8 as "pH";
51 ---
52 6.5
53 (1 row)
54 </screen>
56 In the world of measurements, 6.50 is not the same as 6.5. It may
57 sometimes be critically different. The experimenters usually write
58 down (and publish) the digits they trust. 6.50 is actually a fuzzy
59 interval contained within a bigger and even fuzzier interval, 6.5,
60 with their center points being (probably) the only common feature they
61 share. We definitely do not want such different data items to appear the
62 same.
63 </para>
65 <para>
66 Conclusion? It is nice to have a special data type that can record the
67 limits of an interval with arbitrarily variable precision. Variable in
68 the sense that each data element records its own precision.
69 </para>
71 <para>
72 Check this out:
74 <screen>
75 test=&gt; select '6.25 .. 6.50'::seg as "pH";
77 ------------
78 6.25 .. 6.50
79 (1 row)
80 </screen>
81 </para>
82 </sect2>
84 <sect2 id="seg-syntax">
85 <title>Syntax</title>
87 <para>
88 The external representation of an interval is formed using one or two
89 floating-point numbers joined by the range operator (<literal>..</literal>
90 or <literal>...</literal>). Alternatively, it can be specified as a
91 center point plus or minus a deviation.
92 Optional certainty indicators (<literal>&lt;</literal>,
93 <literal>&gt;</literal> or <literal>~</literal>) can be stored as well.
94 (Certainty indicators are ignored by all the built-in operators, however.)
95 <xref linkend="seg-repr-table"/> gives an overview of allowed
96 representations; <xref linkend="seg-input-examples"/> shows some
97 examples.
98 </para>
100 <para>
101 In <xref linkend="seg-repr-table"/>, <replaceable>x</replaceable>, <replaceable>y</replaceable>, and
102 <replaceable>delta</replaceable> denote
103 floating-point numbers. <replaceable>x</replaceable> and <replaceable>y</replaceable>, but
104 not <replaceable>delta</replaceable>, can be preceded by a certainty indicator.
105 </para>
107 <table id="seg-repr-table">
108 <title><type>seg</type> External Representations</title>
109 <tgroup cols="2">
110 <tbody>
111 <row>
112 <entry><literal><replaceable>x</replaceable></literal></entry>
113 <entry>Single value (zero-length interval)
114 </entry>
115 </row>
116 <row>
117 <entry><literal><replaceable>x</replaceable> .. <replaceable>y</replaceable></literal></entry>
118 <entry>Interval from <replaceable>x</replaceable> to <replaceable>y</replaceable>
119 </entry>
120 </row>
121 <row>
122 <entry><literal><replaceable>x</replaceable> (+-) <replaceable>delta</replaceable></literal></entry>
123 <entry>Interval from <replaceable>x</replaceable> - <replaceable>delta</replaceable> to
124 <replaceable>x</replaceable> + <replaceable>delta</replaceable>
125 </entry>
126 </row>
127 <row>
128 <entry><literal><replaceable>x</replaceable> ..</literal></entry>
129 <entry>Open interval with lower bound <replaceable>x</replaceable>
130 </entry>
131 </row>
132 <row>
133 <entry><literal>.. <replaceable>x</replaceable></literal></entry>
134 <entry>Open interval with upper bound <replaceable>x</replaceable>
135 </entry>
136 </row>
137 </tbody>
138 </tgroup>
139 </table>
141 <table id="seg-input-examples">
142 <title>Examples of Valid <type>seg</type> Input</title>
143 <tgroup cols="2">
144 <colspec colname="col1" colwidth="1*"/>
145 <colspec colname="col2" colwidth="2*"/>
146 <tbody>
147 <row>
148 <entry><literal>5.0</literal></entry>
149 <entry>
150 Creates a zero-length segment (a point, if you will)
151 </entry>
152 </row>
153 <row>
154 <entry><literal>~5.0</literal></entry>
155 <entry>
156 Creates a zero-length segment and records
157 <literal>~</literal> in the data. <literal>~</literal> is ignored
158 by <type>seg</type> operations, but
159 is preserved as a comment.
160 </entry>
161 </row>
162 <row>
163 <entry><literal>&lt;5.0</literal></entry>
164 <entry>
165 Creates a point at 5.0. <literal>&lt;</literal> is ignored but
166 is preserved as a comment.
167 </entry>
168 </row>
169 <row>
170 <entry><literal>&gt;5.0</literal></entry>
171 <entry>
172 Creates a point at 5.0. <literal>&gt;</literal> is ignored but
173 is preserved as a comment.
174 </entry>
175 </row>
176 <row>
177 <entry><literal>5(+-)0.3</literal></entry>
178 <entry>
179 Creates an interval <literal>4.7 .. 5.3</literal>.
180 Note that the <literal>(+-)</literal> notation isn't preserved.
181 </entry>
182 </row>
183 <row>
184 <entry><literal>50 .. </literal></entry>
185 <entry>Everything that is greater than or equal to 50</entry>
186 </row>
187 <row>
188 <entry><literal>.. 0</literal></entry>
189 <entry>Everything that is less than or equal to 0</entry>
190 </row>
191 <row>
192 <entry><literal>1.5e-2 .. 2E-2 </literal></entry>
193 <entry>Creates an interval <literal>0.015 .. 0.02</literal></entry>
194 </row>
195 <row>
196 <entry><literal>1 ... 2</literal></entry>
197 <entry>
198 The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>,
199 or <literal>1..2</literal>
200 (spaces around the range operator are ignored)
201 </entry>
202 </row>
203 </tbody>
204 </tgroup>
205 </table>
207 <para>
208 Because the <literal>...</literal> operator is widely used in data sources, it is allowed
209 as an alternative spelling of the <literal>..</literal> operator. Unfortunately, this
210 creates a parsing ambiguity: it is not clear whether the upper bound
211 in <literal>0...23</literal> is meant to be <literal>23</literal> or <literal>0.23</literal>.
212 This is resolved by requiring at least one digit before the decimal
213 point in all numbers in <type>seg</type> input.
214 </para>
216 <para>
217 As a sanity check, <type>seg</type> rejects intervals with the lower bound
218 greater than the upper, for example <literal>5 .. 2</literal>.
219 </para>
221 </sect2>
223 <sect2 id="seg-precision">
224 <title>Precision</title>
226 <para>
227 <type>seg</type> values are stored internally as pairs of 32-bit floating point
228 numbers. This means that numbers with more than 7 significant digits
229 will be truncated.
230 </para>
232 <para>
233 Numbers with 7 or fewer significant digits retain their
234 original precision. That is, if your query returns 0.00, you will be
235 sure that the trailing zeroes are not the artifacts of formatting: they
236 reflect the precision of the original data. The number of leading
237 zeroes does not affect precision: the value 0.0067 is considered to
238 have just 2 significant digits.
239 </para>
240 </sect2>
242 <sect2 id="seg-usage">
243 <title>Usage</title>
245 <para>
246 The <filename>seg</filename> module includes a GiST index operator class for
247 <type>seg</type> values.
248 The operators supported by the GiST operator class are shown in <xref linkend="seg-gist-operators"/>.
249 </para>
251 <table id="seg-gist-operators">
252 <title>Seg GiST Operators</title>
253 <tgroup cols="1">
254 <thead>
255 <row>
256 <entry role="func_table_entry"><para role="func_signature">
257 Operator
258 </para>
259 <para>
260 Description
261 </para></entry>
262 </row>
263 </thead>
265 <tbody>
266 <row>
267 <entry role="func_table_entry"><para role="func_signature">
268 <type>seg</type> <literal>&lt;&lt;</literal> <type>seg</type>
269 <returnvalue>boolean</returnvalue>
270 </para>
271 <para>
272 Is the first <type>seg</type> entirely to the left of the second?
273 [a, b] &lt;&lt; [c, d] is true if b &lt; c.
274 </para></entry>
275 </row>
277 <row>
278 <entry role="func_table_entry"><para role="func_signature">
279 <type>seg</type> <literal>&gt;&gt;</literal> <type>seg</type>
280 <returnvalue>boolean</returnvalue>
281 </para>
282 <para>
283 Is the first <type>seg</type> entirely to the right of the second?
284 [a, b] &gt;&gt; [c, d] is true if a &gt; d.
285 </para></entry>
286 </row>
288 <row>
289 <entry role="func_table_entry"><para role="func_signature">
290 <type>seg</type> <literal>&amp;&lt;</literal> <type>seg</type>
291 <returnvalue>boolean</returnvalue>
292 </para>
293 <para>
294 Does the first <type>seg</type> not extend to the right of the
295 second?
296 [a, b] &amp;&lt; [c, d] is true if b &lt;= d.
297 </para></entry>
298 </row>
300 <row>
301 <entry role="func_table_entry"><para role="func_signature">
302 <type>seg</type> <literal>&amp;&gt;</literal> <type>seg</type>
303 <returnvalue>boolean</returnvalue>
304 </para>
305 <para>
306 Does the first <type>seg</type> not extend to the left of the
307 second?
308 [a, b] &amp;&gt; [c, d] is true if a &gt;= c.
309 </para></entry>
310 </row>
312 <row>
313 <entry role="func_table_entry"><para role="func_signature">
314 <type>seg</type> <literal>=</literal> <type>seg</type>
315 <returnvalue>boolean</returnvalue>
316 </para>
317 <para>
318 Are the two <type>seg</type>s equal?
319 </para></entry>
320 </row>
322 <row>
323 <entry role="func_table_entry"><para role="func_signature">
324 <type>seg</type> <literal>&amp;&amp;</literal> <type>seg</type>
325 <returnvalue>boolean</returnvalue>
326 </para>
327 <para>
328 Do the two <type>seg</type>s overlap?
329 </para></entry>
330 </row>
332 <row>
333 <entry role="func_table_entry"><para role="func_signature">
334 <type>seg</type> <literal>@&gt;</literal> <type>seg</type>
335 <returnvalue>boolean</returnvalue>
336 </para>
337 <para>
338 Does the first <type>seg</type> contain the second?
339 </para></entry>
340 </row>
342 <row>
343 <entry role="func_table_entry"><para role="func_signature">
344 <type>seg</type> <literal>&lt;@</literal> <type>seg</type>
345 <returnvalue>boolean</returnvalue>
346 </para>
347 <para>
348 Is the first <type>seg</type> contained in the second?
349 </para></entry>
350 </row>
351 </tbody>
352 </tgroup>
353 </table>
355 <para>
356 In addition to the above operators, the usual comparison
357 operators shown in <xref linkend="functions-comparison-op-table"/> are
358 available for type <type>seg</type>. These operators
359 first compare (a) to (c),
360 and if these are equal, compare (b) to (d). That results in
361 reasonably good sorting in most cases, which is useful if
362 you want to use ORDER BY with this type.
363 </para>
364 </sect2>
366 <sect2 id="seg-notes">
367 <title>Notes</title>
369 <para>
370 For examples of usage, see the regression test <filename>sql/seg.sql</filename>.
371 </para>
373 <para>
374 The mechanism that converts <literal>(+-)</literal> to regular ranges
375 isn't completely accurate in determining the number of significant digits
376 for the boundaries. For example, it adds an extra digit to the lower
377 boundary if the resulting interval includes a power of ten:
379 <screen>
380 postgres=&gt; select '10(+-)1'::seg as seg;
382 ---------
383 9.0 .. 11 -- should be: 9 .. 11
384 </screen>
385 </para>
387 <para>
388 The performance of an R-tree index can largely depend on the initial
389 order of input values. It may be very helpful to sort the input table
390 on the <type>seg</type> column; see the script <filename>sort-segments.pl</filename>
391 for an example.
392 </para>
393 </sect2>
395 <sect2 id="seg-credits">
396 <title>Credits</title>
398 <para>
399 Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>,
400 Mathematics and Computer Science Division, Argonne National Laboratory.
401 </para>
403 <para>
404 My thanks are primarily to Prof. Joe Hellerstein
405 (<ulink url="https://dsf.berkeley.edu/jmh/"></ulink>) for elucidating the
406 gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
407 also grateful to all Postgres developers, present and past, for enabling
408 myself to create my own world and live undisturbed in it. And I would like
409 to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
410 Energy for the years of faithful support of my database research.
411 </para>
413 </sect2>
415 </sect1>