1 <!-- doc/src/sgml/citext.sgml -->
3 <sect1 id=
"citext" xreflabel=
"citext">
4 <title>citext
— a case-insensitive character string type
</title>
6 <indexterm zone=
"citext">
7 <primary>citext
</primary>
11 The
<filename>citext
</filename> module provides a case-insensitive
12 character string type,
<type>citext
</type>. Essentially, it internally calls
13 <function>lower
</function> when comparing values. Otherwise, it behaves almost
14 exactly like
<type>text
</type>.
19 Consider using
<firstterm>nondeterministic collations
</firstterm> (see
20 <xref linkend=
"collation-nondeterministic"/>) instead of this module. They
21 can be used for case-insensitive comparisons, accent-insensitive
22 comparisons, and other combinations, and they handle more Unicode special
28 This module is considered
<quote>trusted
</quote>, that is, it can be
29 installed by non-superusers who have
<literal>CREATE
</literal> privilege
30 on the current database.
33 <sect2 id=
"citext-rationale">
34 <title>Rationale
</title>
37 The standard approach to doing case-insensitive matches
38 in
<productname>PostgreSQL
</productname> has been to use the
<function>lower
</function>
39 function when comparing values, for example
42 SELECT * FROM tab WHERE lower(col) = LOWER(?);
47 This works reasonably well, but has a number of drawbacks:
53 It makes your SQL statements verbose, and you always have to remember to
54 use
<function>lower
</function> on both the column and the query value.
59 It won't use an index, unless you create a functional index using
60 <function>lower
</function>.
65 If you declare a column as
<literal>UNIQUE
</literal> or
<literal>PRIMARY
66 KEY
</literal>, the implicitly generated index is case-sensitive. So it's
67 useless for case-insensitive searches, and it won't enforce
68 uniqueness case-insensitively.
74 The
<type>citext
</type> data type allows you to eliminate calls
75 to
<function>lower
</function> in SQL queries, and allows a primary key to
76 be case-insensitive.
<type>citext
</type> is locale-aware, just
77 like
<type>text
</type>, which means that the matching of upper case and
78 lower case characters is dependent on the rules of
79 the database's
<literal>LC_CTYPE
</literal> setting. Again, this behavior is
80 identical to the use of
<function>lower
</function> in queries. But because it's
81 done transparently by the data type, you don't have to remember to do
82 anything special in your queries.
87 <sect2 id=
"citext-how-to-use-it">
88 <title>How to Use It
</title>
91 Here's a simple example of usage:
95 nick CITEXT PRIMARY KEY,
99 INSERT INTO users VALUES ( 'larry', sha256(random()::text::bytea) );
100 INSERT INTO users VALUES ( 'Tom', sha256(random()::text::bytea) );
101 INSERT INTO users VALUES ( 'Damian', sha256(random()::text::bytea) );
102 INSERT INTO users VALUES ( 'NEAL', sha256(random()::text::bytea) );
103 INSERT INTO users VALUES ( 'Bj
ørn', sha256(random()::text::bytea) );
105 SELECT * FROM users WHERE nick = 'Larry';
108 The
<command>SELECT
</command> statement will return one tuple, even though
109 the
<structfield>nick
</structfield> column was set to
<literal>larry
</literal> and the query
110 was for
<literal>Larry
</literal>.
114 <sect2 id=
"citext-string-comparison-behavior">
115 <title>String Comparison Behavior
</title>
118 <type>citext
</type> performs comparisons by converting each string to lower
119 case (as though
<function>lower
</function> were called) and then comparing the
120 results normally. Thus, for example, two strings are considered equal
121 if
<function>lower
</function> would produce identical results for them.
125 In order to emulate a case-insensitive collation as closely as possible,
126 there are
<type>citext
</type>-specific versions of a number of string-processing
127 operators and functions. So, for example, the regular expression
128 operators
<literal>~
</literal> and
<literal>~*
</literal> exhibit the same behavior when
129 applied to
<type>citext
</type>: they both match case-insensitively.
131 for
<literal>!~
</literal> and
<literal>!~*
</literal>, as well as for the
132 <literal>LIKE
</literal> operators
<literal>~~
</literal> and
<literal>~~*
</literal>, and
133 <literal>!~~
</literal> and
<literal>!~~*
</literal>. If you'd like to match
134 case-sensitively, you can cast the operator's arguments to
<type>text
</type>.
138 Similarly, all of the following functions perform matching
139 case-insensitively if their arguments are
<type>citext
</type>:
145 <function>regexp_match()
</function>
150 <function>regexp_matches()
</function>
155 <function>regexp_replace()
</function>
160 <function>regexp_split_to_array()
</function>
165 <function>regexp_split_to_table()
</function>
170 <function>replace()
</function>
175 <function>split_part()
</function>
180 <function>strpos()
</function>
185 <function>translate()
</function>
191 For the regexp functions, if you want to match case-sensitively, you can
192 specify the
<quote>c
</quote> flag to force a case-sensitive match. Otherwise,
193 you must cast to
<type>text
</type> before using one of these functions if
194 you want case-sensitive behavior.
199 <sect2 id=
"citext-limitations">
200 <title>Limitations
</title>
205 <type>citext
</type>'s case-folding behavior depends on
206 the
<literal>LC_CTYPE
</literal> setting of your database. How it compares
207 values is therefore determined when the database is created.
209 case-insensitive in the terms defined by the Unicode standard.
210 Effectively, what this means is that, as long as you're happy with your
211 collation, you should be happy with
<type>citext
</type>'s comparisons. But
212 if you have data in different languages stored in your database, users
213 of one language may find their query results are not as expected if the
214 collation is for another language.
220 As of
<productname>PostgreSQL
</productname> 9.1, you can attach a
221 <literal>COLLATE
</literal> specification to
<type>citext
</type> columns or data
222 values. Currently,
<type>citext
</type> operators will honor a non-default
223 <literal>COLLATE
</literal> specification while comparing case-folded strings,
224 but the initial folding to lower case is always done according to the
225 database's
<literal>LC_CTYPE
</literal> setting (that is, as though
226 <literal>COLLATE
"default"</literal> were given). This may be changed in a
227 future release so that both steps follow the input
<literal>COLLATE
</literal>
234 <type>citext
</type> is not as efficient as
<type>text
</type> because the
235 operator functions and the B-tree comparison functions must make copies
236 of the data and convert it to lower case for comparisons. Also, only
237 <type>text
</type> can support B-Tree deduplication. However,
238 <type>citext
</type> is slightly more efficient than using
239 <function>lower
</function> to get case-insensitive matching.
245 <type>citext
</type> doesn't help much if you need data to compare
246 case-sensitively in some contexts and case-insensitively in other
247 contexts. The standard answer is to use the
<type>text
</type> type and
248 manually use the
<function>lower
</function> function when you need to compare
249 case-insensitively; this works all right if case-insensitive comparison
250 is needed only infrequently. If you need case-insensitive behavior most
251 of the time and case-sensitive infrequently, consider storing the data
252 as
<type>citext
</type> and explicitly casting the column to
<type>text
</type>
253 when you want case-sensitive comparison. In either situation, you will
254 need two indexes if you want both types of searches to be fast.
260 The schema containing the
<type>citext
</type> operators must be
261 in the current
<varname>search_path
</varname> (typically
<literal>public
</literal>);
262 if it is not, the normal case-sensitive
<type>text
</type> operators
263 will be invoked instead.
269 The approach of lower-casing strings for comparison does not handle some
270 Unicode special cases correctly, for example when one upper-case letter
271 has two lower-case letter equivalents. Unicode distinguishes between
272 <firstterm>case mapping
</firstterm> and
<firstterm>case
273 folding
</firstterm> for this reason. Use nondeterministic collations
274 instead of
<type>citext
</type> to handle that correctly.
280 <sect2 id=
"citext-author">
281 <title>Author
</title>
284 David E. Wheeler
<email>david@kineticode.com
</email>
288 Inspired by the original
<type>citext
</type> module by Donald Fraser.