Fix xslt_process() to ensure that it inserts a NULL terminator after the
[PostgreSQL.git] / doc / src / sgml / citext.sgml
blob0b60089acf96cdf89acb75d8ac5bc0cfe9c627c2
1 <!-- $PostgreSQL$ -->
3 <sect1 id="citext">
4 <title>citext</title>
6 <indexterm zone="citext">
7 <primary>citext</primary>
8 </indexterm>
10 <para>
11 The <filename>citext</> module provides a case-insensitive
12 character string type, <type>citext</>. Essentially, it internally calls
13 <function>lower</> when comparing values. Otherwise, it behaves almost
14 exactly like <type>text</>.
15 </para>
17 <sect2>
18 <title>Rationale</title>
20 <para>
21 The standard approach to doing case-insensitive matches
22 in <productname>PostgreSQL</> has been to use the <function>lower</>
23 function when comparing values, for example
25 <programlisting>
26 SELECT * FROM tab WHERE lower(col) = LOWER(?);
27 </programlisting>
28 </para>
30 <para>
31 This works reasonably well, but has a number of drawbacks:
32 </para>
34 <itemizedlist>
35 <listitem>
36 <para>
37 It makes your SQL statements verbose, and you always have to remember to
38 use <function>lower</> on both the column and the query value.
39 </para>
40 </listitem>
41 <listitem>
42 <para>
43 It won't use an index, unless you create a functional index using
44 <function>lower</>.
45 </para>
46 </listitem>
47 <listitem>
48 <para>
49 If you declare a column as <literal>UNIQUE</> or <literal>PRIMARY
50 KEY</>, the implicitly generated index is case-sensitive. So it's
51 useless for case-insensitive searches, and it won't enforce
52 uniqueness case-insensitively.
53 </para>
54 </listitem>
55 </itemizedlist>
57 <para>
58 The <type>citext</> data type allows you to eliminate calls
59 to <function>lower</> in SQL queries, and allows a primary key to
60 be case-insensitive. <type>citext</> is locale-aware, just
61 like <type>text</>, which means that the comparison of uppercase and
62 lowercase characters is dependent on the rules of
63 the <literal>LC_CTYPE</> locale setting. Again, this behavior is
64 identical to the use of <function>lower</> in queries. But because it's
65 done transparently by the datatype, you don't have to remember to do
66 anything special in your queries.
67 </para>
69 </sect2>
71 <sect2>
72 <title>How to Use It</title>
74 <para>
75 Here's a simple example of usage:
77 <programlisting>
78 CREATE TABLE users (
79 nick CITEXT PRIMARY KEY,
80 pass TEXT NOT NULL
83 INSERT INTO users VALUES ( 'larry', md5(random()::text) );
84 INSERT INTO users VALUES ( 'Tom', md5(random()::text) );
85 INSERT INTO users VALUES ( 'Damian', md5(random()::text) );
86 INSERT INTO users VALUES ( 'NEAL', md5(random()::text) );
87 INSERT INTO users VALUES ( 'Bj&oslash;rn', md5(random()::text) );
89 SELECT * FROM users WHERE nick = 'Larry';
90 </programlisting>
92 The <command>SELECT</> statement will return one tuple, even though
93 the <structfield>nick</> column was set to <quote>larry</> and the query
94 was for <quote>Larry</>.
95 </para>
96 </sect2>
98 <sect2>
99 <title>String Comparison Behavior</title>
100 <para>
101 In order to emulate a case-insensitive collation as closely as possible,
102 there are <type>citext</>-specific versions of a number of the comparison
103 operators and functions. So, for example, the regular expression
104 operators <literal>~</> and <literal>~*</> exhibit the same behavior when
105 applied to <type>citext</>: they both compare case-insensitively.
106 The same is true
107 for <literal>!~</> and <literal>!~*</>, as well as for the
108 <literal>LIKE</> operators <literal>~~</> and <literal>~~*</>, and
109 <literal>!~~</> and <literal>!~~*</>. If you'd like to match
110 case-sensitively, you can always cast to <type>text</> before comparing.
111 </para>
113 <para>
114 Similarly, all of the following functions perform matching
115 case-insensitively if their arguments are <type>citext</>:
116 </para>
118 <itemizedlist>
119 <listitem>
120 <para>
121 <function>regexp_replace()</>
122 </para>
123 </listitem>
124 <listitem>
125 <para>
126 <function>regexp_split_to_array()</>
127 </para>
128 </listitem>
129 <listitem>
130 <para>
131 <function>regexp_split_to_table()</>
132 </para>
133 </listitem>
134 <listitem>
135 <para>
136 <function>replace()</>
137 </para>
138 </listitem>
139 <listitem>
140 <para>
141 <function>split_part()</>
142 </para>
143 </listitem>
144 <listitem>
145 <para>
146 <function>strpos()</>
147 </para>
148 </listitem>
149 <listitem>
150 <para>
151 <function>translate()</>
152 </para>
153 </listitem>
154 </itemizedlist>
156 <para>
157 For the regexp functions, if you want to match case-sensitively, you can
158 specify the <quote>c</> flag to force a case-sensitive match. Otherwise,
159 you must cast to <type>text</> before using one of these functions if
160 you want case-sensitive behavior.
161 </para>
163 </sect2>
165 <sect2>
166 <title>Limitations</title>
168 <itemizedlist>
169 <listitem>
170 <para>
171 <type>citext</>'s behavior depends on
172 the <literal>LC_CTYPE</> setting of your database. How it compares
173 values is therefore determined when
174 <application>initdb</> is run to create the cluster. It is not truly
175 case-insensitive in the terms defined by the Unicode standard.
176 Effectively, what this means is that, as long as you're happy with your
177 collation, you should be happy with <type>citext</>'s comparisons. But
178 if you have data in different languages stored in your database, users
179 of one language may find their query results are not as expected if the
180 collation is for another language.
181 </para>
182 </listitem>
184 <listitem>
185 <para>
186 <type>citext</> is not as efficient as <type>text</> because the
187 operator functions and the btree comparison functions must make copies
188 of the data and convert it to lower case for comparisons. It is,
189 however, slightly more efficient than using <function>lower</> to get
190 case-insensitive matching.
191 </para>
192 </listitem>
194 <listitem>
195 <para>
196 <type>citext</> doesn't help much if you need data to compare
197 case-sensitively in some contexts and case-insensitively in other
198 contexts. The standard answer is to use the <type>text</> type and
199 manually use the <function>lower</> function when you need to compare
200 case-insensitively; this works all right if case-insensitive comparison
201 is needed only infrequently. If you need case-insensitive most of
202 the time and case-sensitive infrequently, consider storing the data
203 as <type>citext</> and explicitly casting the column to <type>text</>
204 when you want case-sensitive comparison. In either situation, you
205 will need two indexes if you want both types of searches to be fast.
206 </para>
207 </listitem>
208 </itemizedlist>
209 </sect2>
211 <sect2>
212 <title>Author</title>
214 <para>
215 David E. Wheeler <email>david@kineticode.com</email>
216 </para>
218 <para>
219 Inspired by the original <type>citext</> module by Donald Fraser.
220 </para>
222 </sect2>
224 </sect1>