4 <title id=
"indexes-title">Indexes
</title>
6 <indexterm zone=
"indexes">
7 <primary>index
</primary>
11 Indexes are a common way to enhance database performance. An index
12 allows the database server to find and retrieve specific rows much
13 faster than it could do without an index. But indexes also add
14 overhead to the database system as a whole, so they should be used
19 <sect1 id=
"indexes-intro">
20 <title>Introduction
</title>
23 Suppose we have a table similar to this:
30 and the application issues many queries of the form:
32 SELECT content FROM test1 WHERE id =
<replaceable>constant
</replaceable>;
34 With no advance preparation, the system would have to scan the entire
35 <structname>test1
</structname> table, row by row, to find all
36 matching entries. If there are many rows in
37 <structname>test1
</structname> and only a few rows (perhaps zero
38 or one) that would be returned by such a query, this is clearly an
39 inefficient method. But if the system has been instructed to maintain an
40 index on the
<structfield>id
</structfield> column, it can use a more
41 efficient method for locating matching rows. For instance, it
42 might only have to walk a few levels deep into a search tree.
46 A similar approach is used in most non-fiction books: terms and
47 concepts that are frequently looked up by readers are collected in
48 an alphabetic index at the end of the book. The interested reader
49 can scan the index relatively quickly and flip to the appropriate
50 page(s), rather than having to read the entire book to find the
51 material of interest. Just as it is the task of the author to
52 anticipate the items that readers are likely to look up,
53 it is the task of the database programmer to foresee which indexes
58 The following command can be used to create an index on the
59 <structfield>id
</structfield> column, as discussed:
61 CREATE INDEX test1_id_index ON test1 (id);
63 The name
<structname>test1_id_index
</structname> can be chosen
64 freely, but you should pick something that enables you to remember
65 later what the index was for.
69 To remove an index, use the
<command>DROP INDEX
</command> command.
70 Indexes can be added to and removed from tables at any time.
74 Once an index is created, no further intervention is required: the
75 system will update the index when the table is modified, and it will
76 use the index in queries when it thinks doing so would be more efficient
77 than a sequential table scan. But you might have to run the
78 <command>ANALYZE
</command> command regularly to update
79 statistics to allow the query planner to make educated decisions.
80 See
<xref linkend=
"performance-tips"> for information about
81 how to find out whether an index is used and when and why the
82 planner might choose
<emphasis>not
</emphasis> to use an index.
86 Indexes can also benefit
<command>UPDATE
</command> and
87 <command>DELETE
</command> commands with search conditions.
88 Indexes can moreover be used in join searches. Thus,
89 an index defined on a column that is part of a join condition can
90 also significantly speed up queries with joins.
94 Creating an index on a large table can take a long time. By default,
95 <productname>PostgreSQL
</productname> allows reads (selects) to occur
96 on the table in parallel with index creation, but writes (INSERTs,
97 UPDATEs, DELETEs) are blocked until the index build is finished.
98 In production environments this is often unacceptable.
99 It is possible to allow writes to occur in parallel with index
100 creation, but there are several caveats to be aware of
—
101 for more information see
<xref linkend=
"SQL-CREATEINDEX-CONCURRENTLY"
102 endterm=
"SQL-CREATEINDEX-CONCURRENTLY-title">.
106 After an index is created, the system has to keep it synchronized with the
107 table. This adds overhead to data manipulation operations.
108 Therefore indexes that are seldom or never used in queries
114 <sect1 id=
"indexes-types">
115 <title>Index Types
</title>
118 <productname>PostgreSQL
</productname> provides several index types:
119 B-tree, Hash, GiST and GIN. Each index type uses a different
120 algorithm that is best suited to different types of queries.
121 By default, the
<command>CREATE INDEX
</command> command creates
122 B-tree indexes, which fit the most common situations.
127 <primary>index
</primary>
128 <secondary>B-tree
</secondary>
131 <primary>B-tree
</primary>
134 B-trees can handle equality and range queries on data that can be sorted
136 In particular, the
<productname>PostgreSQL
</productname> query planner
137 will consider using a B-tree index whenever an indexed column is
138 involved in a comparison using one of these operators:
141 <member><literal><</literal></member>
142 <member><literal><=
</literal></member>
143 <member><literal>=
</literal></member>
144 <member><literal>>=
</literal></member>
145 <member><literal>></literal></member>
148 Constructs equivalent to combinations of these operators, such as
149 <literal>BETWEEN<
/> and
<literal>IN<
/>, can also be implemented with
150 a B-tree index search. Also, an
<literal>IS NULL<
/> condition on
151 an index column can be used with a B-tree index.
155 The optimizer can also use a B-tree index for queries involving the
156 pattern matching operators
<literal>LIKE<
/> and
<literal>~
</literal>
157 <emphasis>if
</emphasis> the pattern is a constant and is anchored to
158 the beginning of the string
— for example,
<literal>col LIKE
159 'foo%'
</literal> or
<literal>col ~ '^foo'
</literal>, but not
160 <literal>col LIKE '%bar'
</literal>. However, if your database does not
161 use the C locale you will need to create the index with a special
162 operator class to support indexing of pattern-matching queries; see
163 <xref linkend=
"indexes-opclass"> below. It is also possible to use
164 B-tree indexes for
<literal>ILIKE
</literal> and
165 <literal>~*
</literal>, but only if the pattern starts with
166 non-alphabetic characters, i.e., characters that are not affected by
167 upper/lower case conversion.
173 <primary>index
</primary>
174 <secondary>hash
</secondary>
177 <primary>hash
</primary>
180 Hash indexes can only handle simple equality comparisons.
181 The query planner will consider using a hash index whenever an
182 indexed column is involved in a comparison using the
183 <literal>=
</literal> operator. (Hash indexes do not support
184 <literal>IS NULL<
/> searches.)
185 The following command is used to create a hash index:
187 CREATE INDEX
<replaceable>name
</replaceable> ON
<replaceable>table
</replaceable> USING hash (
<replaceable>column
</replaceable>);
193 Hash index operations are not presently WAL-logged,
194 so hash indexes might need to be rebuilt with
<command>REINDEX<
/>
195 after a database crash.
196 For this reason, hash index use is presently discouraged.
202 <primary>index
</primary>
203 <secondary>GiST
</secondary>
206 <primary>GiST
</primary>
209 GiST indexes are not a single kind of index, but rather an infrastructure
210 within which many different indexing strategies can be implemented.
211 Accordingly, the particular operators with which a GiST index can be
212 used vary depending on the indexing strategy (the
<firstterm>operator
213 class<
/>). As an example, the standard distribution of
214 <productname>PostgreSQL
</productname> includes GiST operator classes
215 for several two-dimensional geometric data types, which support indexed
216 queries using these operators:
219 <member><literal><<</literal></member>
220 <member><literal>&<</literal></member>
221 <member><literal>&></literal></member>
222 <member><literal>>></literal></member>
223 <member><literal><<|
</literal></member>
224 <member><literal>&<|
</literal></member>
225 <member><literal>|
&></literal></member>
226 <member><literal>|
>></literal></member>
227 <member><literal>@
></literal></member>
228 <member><literal><@
</literal></member>
229 <member><literal>~=
</literal></member>
230 <member><literal>&&</literal></member>
233 (See
<xref linkend=
"functions-geometry"> for the meaning of
235 Many other GiST operator
236 classes are available in the
<literal>contrib<
/> collection or as separate
237 projects. For more information see
<xref linkend=
"GiST">.
241 <primary>index
</primary>
242 <secondary>GIN
</secondary>
245 <primary>GIN
</primary>
248 GIN indexes are inverted indexes which can handle values that contain more
249 than one key, arrays for example. Like GiST, GIN can support
250 many different user-defined indexing strategies and the particular
251 operators with which a GIN index can be used vary depending on the
253 As an example, the standard distribution of
254 <productname>PostgreSQL
</productname> includes GIN operator classes
255 for one-dimensional arrays, which support indexed
256 queries using these operators:
259 <member><literal><@
</literal></member>
260 <member><literal>@
></literal></member>
261 <member><literal>=
</literal></member>
262 <member><literal>&&</literal></member>
265 (See
<xref linkend=
"functions-array"> for the meaning of
267 Many other GIN operator
268 classes are available in the
<literal>contrib<
/> collection or as separate
269 projects. For more information see
<xref linkend=
"GIN">.
274 <sect1 id=
"indexes-multicolumn">
275 <title>Multicolumn Indexes
</title>
277 <indexterm zone=
"indexes-multicolumn">
278 <primary>index
</primary>
279 <secondary>multicolumn
</secondary>
283 An index can be defined on more than one column of a table. For example, if
284 you have a table of this form:
292 (say, you keep your
<filename class=
"directory">/dev
</filename>
293 directory in a database...) and you frequently issue queries like:
295 SELECT name FROM test2 WHERE major =
<replaceable>constant
</replaceable> AND minor =
<replaceable>constant
</replaceable>;
297 then it might be appropriate to define an index on the columns
298 <structfield>major
</structfield> and
299 <structfield>minor
</structfield> together, e.g.:
301 CREATE INDEX test2_mm_idx ON test2 (major, minor);
306 Currently, only the B-tree, GiST and GIN index types support multicolumn
307 indexes. Up to
32 columns can be specified. (This limit can be
308 altered when building
<productname>PostgreSQL
</productname>; see the
309 file
<filename>pg_config_manual.h
</filename>.)
313 A multicolumn B-tree index can be used with query conditions that
314 involve any subset of the index's columns, but the index is most
315 efficient when there are constraints on the leading (leftmost) columns.
316 The exact rule is that equality constraints on leading columns, plus
317 any inequality constraints on the first column that does not have an
318 equality constraint, will be used to limit the portion of the index
319 that is scanned. Constraints on columns to the right of these columns
320 are checked in the index, so they save visits to the table proper, but
321 they do not reduce the portion of the index that has to be scanned.
322 For example, given an index on
<literal>(a, b, c)
</literal> and a
323 query condition
<literal>WHERE a =
5 AND b
>=
42 AND c
< 77<
/>,
324 the index would have to be scanned from the first entry with
325 <literal>a<
/> =
5 and
<literal>b<
/> =
42 up through the last entry with
326 <literal>a<
/> =
5. Index entries with
<literal>c<
/> >=
77 would be
327 skipped, but they'd still have to be scanned through.
328 This index could in principle be used for queries that have constraints
329 on
<literal>b<
/> and/or
<literal>c<
/> with no constraint on
<literal>a<
/>
330 — but the entire index would have to be scanned, so in most cases
331 the planner would prefer a sequential table scan over using the index.
335 A multicolumn GiST index can be used with query conditions that
336 involve any subset of the index's columns. Conditions on additional
337 columns restrict the entries returned by the index, but the condition on
338 the first column is the most important one for determining how much of
339 the index needs to be scanned. A GiST index will be relatively
340 ineffective if its first column has only a few distinct values, even if
341 there are many distinct values in additional columns.
345 A multicolumn GIN index can be used with query conditions that
346 involve any subset of the index's columns. Unlike B-tree or GiST,
347 index search effectiveness is the same regardless of which index column(s)
348 the query conditions use.
352 Of course, each column must be used with operators appropriate to the index
353 type; clauses that involve other operators will not be considered.
357 Multicolumn indexes should be used sparingly. In most situations,
358 an index on a single column is sufficient and saves space and time.
359 Indexes with more than three columns are unlikely to be helpful
360 unless the usage of the table is extremely stylized. See also
361 <xref linkend=
"indexes-bitmap-scans"> for some discussion of the
362 merits of different index configurations.
367 <sect1 id=
"indexes-ordering">
368 <title>Indexes and
<literal>ORDER BY<
/></title>
370 <indexterm zone=
"indexes-ordering">
371 <primary>index
</primary>
372 <secondary>and
<literal>ORDER BY<
/></secondary>
376 In addition to simply finding the rows to be returned by a query,
377 an index may be able to deliver them in a specific sorted order.
378 This allows a query's
<literal>ORDER BY<
/> specification to be honored
379 without a separate sorting step. Of the index types currently
380 supported by
<productname>PostgreSQL
</productname>, only B-tree
381 can produce sorted output
— the other index types return
382 matching rows in an unspecified, implementation-dependent order.
386 The planner will consider satisfying an
<literal>ORDER BY<
/> specification
387 either by scanning an available index that matches the specification,
388 or by scanning the table in physical order and doing an explicit
389 sort. For a query that requires scanning a large fraction of the
390 table, an explicit sort is likely to be faster than using an index
392 less disk I/O due to following a sequential access pattern. Indexes are
393 more useful when only a few rows need be fetched. An important
394 special case is
<literal>ORDER BY<
/> in combination with
395 <literal>LIMIT<
/> <replaceable>n<
/>: an explicit sort will have to process
396 all the data to identify the first
<replaceable>n<
/> rows, but if there is
397 an index matching the
<literal>ORDER BY<
/>, the first
<replaceable>n<
/>
398 rows can be retrieved directly, without scanning the remainder at all.
402 By default, B-tree indexes store their entries in ascending order
403 with nulls last. This means that a forward scan of an index on
404 column
<literal>x<
/> produces output satisfying
<literal>ORDER BY x<
/>
405 (or more verbosely,
<literal>ORDER BY x ASC NULLS LAST<
/>). The
406 index can also be scanned backward, producing output satisfying
407 <literal>ORDER BY x DESC<
/>
408 (or more verbosely,
<literal>ORDER BY x DESC NULLS FIRST<
/>, since
409 <literal>NULLS FIRST<
/> is the default for
<literal>ORDER BY DESC<
/>).
413 You can adjust the ordering of a B-tree index by including the
414 options
<literal>ASC<
/>,
<literal>DESC<
/>,
<literal>NULLS FIRST<
/>,
415 and/or
<literal>NULLS LAST<
/> when creating the index; for example:
417 CREATE INDEX test2_info_nulls_low ON test2 (info NULLS FIRST);
418 CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);
420 An index stored in ascending order with nulls first can satisfy
421 either
<literal>ORDER BY x ASC NULLS FIRST<
/> or
422 <literal>ORDER BY x DESC NULLS LAST<
/> depending on which direction
427 You might wonder why bother providing all four options, when two
428 options together with the possibility of backward scan would cover
429 all the variants of
<literal>ORDER BY<
/>. In single-column indexes
430 the options are indeed redundant, but in multicolumn indexes they can be
431 useful. Consider a two-column index on
<literal>(x, y)<
/>: this can
432 satisfy
<literal>ORDER BY x, y<
/> if we scan forward, or
433 <literal>ORDER BY x DESC, y DESC<
/> if we scan backward.
434 But it might be that the application frequently needs to use
435 <literal>ORDER BY x ASC, y DESC<
/>. There is no way to get that
436 ordering from a plain index, but it is possible if the index is defined
437 as
<literal>(x ASC, y DESC)<
/> or
<literal>(x DESC, y ASC)<
/>.
441 Obviously, indexes with non-default sort orderings are a fairly
442 specialized feature, but sometimes they can produce tremendous
443 speedups for certain queries. Whether it's worth maintaining such an
444 index depends on how often you use queries that require a special
450 <sect1 id=
"indexes-bitmap-scans">
451 <title>Combining Multiple Indexes
</title>
453 <indexterm zone=
"indexes-bitmap-scans">
454 <primary>index
</primary>
455 <secondary>combining multiple indexes
</secondary>
458 <indexterm zone=
"indexes-bitmap-scans">
459 <primary>bitmap scan
</primary>
463 A single index scan can only use query clauses that use the index's
464 columns with operators of its operator class and are joined with
465 <literal>AND<
/>. For example, given an index on
<literal>(a, b)
</literal>
466 a query condition like
<literal>WHERE a =
5 AND b =
6<
/> could
467 use the index, but a query like
<literal>WHERE a =
5 OR b =
6<
/> could not
468 directly use the index.
473 <productname>PostgreSQL<
/> has the ability to combine multiple indexes
474 (including multiple uses of the same index) to handle cases that cannot
475 be implemented by single index scans. The system can form
<literal>AND<
/>
476 and
<literal>OR<
/> conditions across several index scans. For example,
477 a query like
<literal>WHERE x =
42 OR x =
47 OR x =
53 OR x =
99<
/>
478 could be broken down into four separate scans of an index on
<literal>x<
/>,
479 each scan using one of the query clauses. The results of these scans are
480 then ORed together to produce the result. Another example is that if we
481 have separate indexes on
<literal>x<
/> and
<literal>y<
/>, one possible
482 implementation of a query like
<literal>WHERE x =
5 AND y =
6<
/> is to
483 use each index with the appropriate query clause and then AND together
484 the index results to identify the result rows.
488 To combine multiple indexes, the system scans each needed index and
489 prepares a
<firstterm>bitmap<
/> in memory giving the locations of
490 table rows that are reported as matching that index's conditions.
491 The bitmaps are then ANDed and ORed together as needed by the query.
492 Finally, the actual table rows are visited and returned. The table rows
493 are visited in physical order, because that is how the bitmap is laid
494 out; this means that any ordering of the original indexes is lost, and
495 so a separate sort step will be needed if the query has an
<literal>ORDER
496 BY<
/> clause. For this reason, and because each additional index scan
497 adds extra time, the planner will sometimes choose to use a simple index
498 scan even though additional indexes are available that could have been
503 In all but the simplest applications, there are various combinations of
504 indexes that might be useful, and the database developer must make
505 trade-offs to decide which indexes to provide. Sometimes multicolumn
506 indexes are best, but sometimes it's better to create separate indexes
507 and rely on the index-combination feature. For example, if your
508 workload includes a mix of queries that sometimes involve only column
509 <literal>x<
/>, sometimes only column
<literal>y<
/>, and sometimes both
510 columns, you might choose to create two separate indexes on
511 <literal>x<
/> and
<literal>y<
/>, relying on index combination to
512 process the queries that use both columns. You could also create a
513 multicolumn index on
<literal>(x, y)<
/>. This index would typically be
514 more efficient than index combination for queries involving both
515 columns, but as discussed in
<xref linkend=
"indexes-multicolumn">, it
516 would be almost useless for queries involving only
<literal>y<
/>, so it
517 should not be the only index. A combination of the multicolumn index
518 and a separate index on
<literal>y<
/> would serve reasonably well. For
519 queries involving only
<literal>x<
/>, the multicolumn index could be
520 used, though it would be larger and hence slower than an index on
521 <literal>x<
/> alone. The last alternative is to create all three
522 indexes, but this is probably only reasonable if the table is searched
523 much more often than it is updated and all three types of query are
524 common. If one of the types of query is much less common than the
525 others, you'd probably settle for creating just the two indexes that
526 best match the common types.
532 <sect1 id=
"indexes-unique">
533 <title>Unique Indexes
</title>
535 <indexterm zone=
"indexes-unique">
536 <primary>index
</primary>
537 <secondary>unique
</secondary>
541 Indexes can also be used to enforce uniqueness of a column's value,
542 or the uniqueness of the combined values of more than one column.
544 CREATE UNIQUE INDEX
<replaceable>name
</replaceable> ON
<replaceable>table
</replaceable> (
<replaceable>column
</replaceable> <optional>, ...
</optional>);
546 Currently, only B-tree indexes can be declared unique.
550 When an index is declared unique, multiple table rows with equal
551 indexed values are not allowed. Null values are not considered
552 equal. A multicolumn unique index will only reject cases where all
553 indexed columns are equal in multiple rows.
557 <productname>PostgreSQL
</productname> automatically creates a unique
558 index when a unique constraint or primary key is defined for a table.
559 The index covers the columns that make up the primary key or unique
560 constraint (a multicolumn index, if appropriate), and is the mechanism
561 that enforces the constraint.
566 The preferred way to add a unique constraint to a table is
567 <literal>ALTER TABLE ... ADD CONSTRAINT
</literal>. The use of
568 indexes to enforce unique constraints could be considered an
569 implementation detail that should not be accessed directly.
570 One should, however, be aware that there's no need to manually
571 create indexes on unique columns; doing so would just duplicate
572 the automatically-created index.
578 <sect1 id=
"indexes-expressional">
579 <title>Indexes on Expressions
</title>
581 <indexterm zone=
"indexes-expressional">
582 <primary>index
</primary>
583 <secondary sortas=
"expressions">on expressions
</secondary>
587 An index column need not be just a column of the underlying table,
588 but can be a function or scalar expression computed from one or
589 more columns of the table. This feature is useful to obtain fast
590 access to tables based on the results of computations.
594 For example, a common way to do case-insensitive comparisons is to
595 use the
<function>lower
</function> function:
597 SELECT * FROM test1 WHERE lower(col1) = 'value';
599 This query can use an index if one has been
600 defined on the result of the
<literal>lower(col1)
</literal>
603 CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));
608 If we were to declare this index
<literal>UNIQUE<
/>, it would prevent
609 creation of rows whose
<literal>col1<
/> values differ only in case,
610 as well as rows whose
<literal>col1<
/> values are actually identical.
611 Thus, indexes on expressions can be used to enforce constraints that
612 are not definable as simple unique constraints.
616 As another example, if one often does queries like:
618 SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith';
620 then it might be worth creating an index like this:
622 CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
627 The syntax of the
<command>CREATE INDEX<
/> command normally requires
628 writing parentheses around index expressions, as shown in the second
629 example. The parentheses can be omitted when the expression is just
630 a function call, as in the first example.
634 Index expressions are relatively expensive to maintain, because the
635 derived expression(s) must be computed for each row upon insertion
636 and whenever it is updated. However, the index expressions are
637 <emphasis>not<
/> recomputed during an indexed search, since they are
638 already stored in the index. In both examples above, the system
639 sees the query as just
<literal>WHERE indexedcolumn = 'constant'<
/>
640 and so the speed of the search is equivalent to any other simple index
641 query. Thus, indexes on expressions are useful when retrieval speed
642 is more important than insertion and update speed.
647 <sect1 id=
"indexes-partial">
648 <title>Partial Indexes
</title>
650 <indexterm zone=
"indexes-partial">
651 <primary>index
</primary>
652 <secondary>partial
</secondary>
656 A
<firstterm>partial index
</firstterm> is an index built over a
657 subset of a table; the subset is defined by a conditional
658 expression (called the
<firstterm>predicate
</firstterm> of the
659 partial index). The index contains entries only for those table
660 rows that satisfy the predicate. Partial indexes are a specialized
661 feature, but there are several situations in which they are useful.
665 One major reason for using a partial index is to avoid indexing common
666 values. Since a query searching for a common value (one that
667 accounts for more than a few percent of all the table rows) will not
668 use the index anyway, there is no point in keeping those rows in the
669 index at all. This reduces the size of the index, which will speed
670 up those queries that do use the index. It will also speed up many table
671 update operations because the index does not need to be
672 updated in all cases.
<xref linkend=
"indexes-partial-ex1"> shows a
673 possible application of this idea.
676 <example id=
"indexes-partial-ex1">
677 <title>Setting up a Partial Index to Exclude Common Values
</title>
680 Suppose you are storing web server access logs in a database.
681 Most accesses originate from the IP address range of your organization but
682 some are from elsewhere (say, employees on dial-up connections).
683 If your searches by IP are primarily for outside accesses,
684 you probably do not need to index the IP range that corresponds to your
685 organization's subnet.
689 Assume a table like this:
691 CREATE TABLE access_log (
700 To create a partial index that suits our example, use a command
703 CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
704 WHERE NOT (client_ip
> inet '
192.168.100.0' AND
705 client_ip
< inet '
192.168.100.255');
710 A typical query that can use this index would be:
714 WHERE url = '/index.html' AND client_ip = inet '
212.78.10.32';
716 A query that cannot use this index is:
720 WHERE client_ip = inet '
192.168.100.23';
725 Observe that this kind of partial index requires that the common
726 values be predetermined, so such partial indexes are best used for
727 data distributions that do not change. The indexes can be recreated
728 occasionally to adjust for new data distributions, but this adds
734 Another possible use for a partial index is to exclude values from the
736 typical query workload is not interested in; this is shown in
<xref
737 linkend=
"indexes-partial-ex2">. This results in the same
738 advantages as listed above, but it prevents the
739 <quote>uninteresting
</quote> values from being accessed via that
740 index, even if an index scan might be profitable in that
741 case. Obviously, setting up partial indexes for this kind of
742 scenario will require a lot of care and experimentation.
745 <example id=
"indexes-partial-ex2">
746 <title>Setting up a Partial Index to Exclude Uninteresting Values
</title>
749 If you have a table that contains both billed and unbilled orders,
750 where the unbilled orders take up a small fraction of the total
751 table and yet those are the most-accessed rows, you can improve
752 performance by creating an index on just the unbilled rows. The
753 command to create the index would look like this:
755 CREATE INDEX orders_unbilled_index ON orders (order_nr)
756 WHERE billed is not true;
761 A possible query to use this index would be:
763 SELECT * FROM orders WHERE billed is not true AND order_nr
< 10000;
765 However, the index can also be used in queries that do not involve
766 <structfield>order_nr<
/> at all, e.g.:
768 SELECT * FROM orders WHERE billed is not true AND amount
> 5000.00;
770 This is not as efficient as a partial index on the
771 <structfield>amount<
/> column would be, since the system has to
772 scan the entire index. Yet, if there are relatively few unbilled
773 orders, using this partial index just to find the unbilled orders
778 Note that this query cannot use this index:
780 SELECT * FROM orders WHERE order_nr =
3501;
782 The order
3501 might be among the billed or unbilled
788 <xref linkend=
"indexes-partial-ex2"> also illustrates that the
789 indexed column and the column used in the predicate do not need to
790 match.
<productname>PostgreSQL
</productname> supports partial
791 indexes with arbitrary predicates, so long as only columns of the
792 table being indexed are involved. However, keep in mind that the
793 predicate must match the conditions used in the queries that
794 are supposed to benefit from the index. To be precise, a partial
795 index can be used in a query only if the system can recognize that
796 the
<literal>WHERE<
/> condition of the query mathematically implies
797 the predicate of the index.
798 <productname>PostgreSQL
</productname> does not have a sophisticated
799 theorem prover that can recognize mathematically equivalent
800 expressions that are written in different forms. (Not
801 only is such a general theorem prover extremely difficult to
802 create, it would probably be too slow to be of any real use.)
803 The system can recognize simple inequality implications, for example
804 <quote>x
< 1</quote> implies
<quote>x
< 2</quote>; otherwise
805 the predicate condition must exactly match part of the query's
806 <literal>WHERE<
/> condition
807 or the index will not be recognized as usable. Matching takes
808 place at query planning time, not at run time. As a result,
809 parameterized query clauses do not work with a partial index. For
810 example a prepared query with a parameter might specify
811 <quote>x
< ?
</quote> which will never imply
812 <quote>x
< 2</quote> for all possible values of the parameter.
816 A third possible use for partial indexes does not require the
817 index to be used in queries at all. The idea here is to create
818 a unique index over a subset of a table, as in
<xref
819 linkend=
"indexes-partial-ex3">. This enforces uniqueness
820 among the rows that satisfy the index predicate, without constraining
824 <example id=
"indexes-partial-ex3">
825 <title>Setting up a Partial Unique Index
</title>
828 Suppose that we have a table describing test outcomes. We wish
829 to ensure that there is only one
<quote>successful<
/> entry for
830 a given subject and target combination, but there might be any number of
831 <quote>unsuccessful<
/> entries. Here is one way to do it:
840 CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
843 This is a particularly efficient approach when there are few
844 successful tests and many unsuccessful ones.
849 Finally, a partial index can also be used to override the system's
850 query plan choices. Also, data sets with peculiar
851 distributions might cause the system to use an index when it really
852 should not. In that case the index can be set up so that it is not
853 available for the offending query. Normally,
854 <productname>PostgreSQL<
/> makes reasonable choices about index
855 usage (e.g., it avoids them when retrieving common values, so the
856 earlier example really only saves index size, it is not required to
857 avoid index usage), and grossly incorrect plan choices are cause
862 Keep in mind that setting up a partial index indicates that you
863 know at least as much as the query planner knows, in particular you
864 know when an index might be profitable. Forming this knowledge
865 requires experience and understanding of how indexes in
866 <productname>PostgreSQL<
/> work. In most cases, the advantage of a
867 partial index over a regular index will be minimal.
871 More information about partial indexes can be found in
<xref
872 linkend=
"STON89b">,
<xref linkend=
"OLSON93">, and
<xref
873 linkend=
"SESHADRI95">.
878 <sect1 id=
"indexes-opclass">
879 <title>Operator Classes and Operator Families
</title>
881 <indexterm zone=
"indexes-opclass">
882 <primary>operator class
</primary>
885 <indexterm zone=
"indexes-opclass">
886 <primary>operator family
</primary>
890 An index definition can specify an
<firstterm>operator
891 class
</firstterm> for each column of an index.
893 CREATE INDEX
<replaceable>name
</replaceable> ON
<replaceable>table
</replaceable> (
<replaceable>column
</replaceable> <replaceable>opclass
</replaceable> <optional><replaceable>sort options
</replaceable></optional> <optional>, ...
</optional>);
895 The operator class identifies the operators to be used by the index
896 for that column. For example, a B-tree index on the type
<type>int4
</type>
897 would use the
<literal>int4_ops
</literal> class; this operator
898 class includes comparison functions for values of type
<type>int4
</type>.
899 In practice the default operator class for the column's data type is
900 usually sufficient. The main reason for having operator classes is
901 that for some data types, there could be more than one meaningful
902 index behavior. For example, we might want to sort a complex-number data
903 type either by absolute value or by real part. We could do this by
904 defining two operator classes for the data type and then selecting
905 the proper class when making an index. The operator class determines
906 the basic sort ordering (which can then be modified by adding sort options
907 <literal>ASC<
/>/
<literal>DESC<
/> and/or
908 <literal>NULLS FIRST<
/>/
<literal>NULLS LAST<
/>).
912 There are also some built-in operator classes besides the default ones:
917 The operator classes
<literal>text_pattern_ops
</literal>,
918 <literal>varchar_pattern_ops
</literal>, and
919 <literal>bpchar_pattern_ops
</literal> support B-tree indexes on
920 the types
<type>text
</type>,
<type>varchar
</type>, and
921 <type>char
</type> respectively. The
922 difference from the default operator classes is that the values
923 are compared strictly character by character rather than
924 according to the locale-specific collation rules. This makes
925 these operator classes suitable for use by queries involving
926 pattern matching expressions (
<literal>LIKE
</literal> or POSIX
927 regular expressions) when the database does not use the standard
928 <quote>C
</quote> locale. As an example, you might index a
929 <type>varchar
</type> column like this:
931 CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
933 Note that you should also create an index with the default operator
934 class if you want queries involving ordinary
<literal><<
/>,
935 <literal><=<
/>,
<literal>><
/>, or
<literal>>=<
/> comparisons
936 to use an index. Such queries cannot use the
937 <literal><replaceable>xxx
</replaceable>_pattern_ops
</literal>
938 operator classes. (Ordinary equality comparisons can use these
939 operator classes, however.) It is possible to create multiple
940 indexes on the same column with different operator classes.
941 If you do use the C locale, you do not need the
942 <literal><replaceable>xxx
</replaceable>_pattern_ops
</literal>
943 operator classes, because an index with the default operator class
944 is usable for pattern-matching queries in the C locale.
951 The following query shows all defined operator classes:
954 SELECT am.amname AS index_method,
955 opc.opcname AS opclass_name
956 FROM pg_am am, pg_opclass opc
957 WHERE opc.opcmethod = am.oid
958 ORDER BY index_method, opclass_name;
963 An operator class is actually just a subset of a larger structure called an
964 <firstterm>operator family<
/>. In cases where several data types have
965 similar behaviors, it is frequently useful to define cross-data-type
966 operators and allow these to work with indexes. To do this, the operator
967 classes for each of the types must be grouped into the same operator
968 family. The cross-type operators are members of the family, but are not
969 associated with any single class within the family.
973 This query shows all defined operator families and all
974 the operators included in each family:
976 SELECT am.amname AS index_method,
977 opf.opfname AS opfamily_name,
978 amop.amopopr::regoperator AS opfamily_operator
979 FROM pg_am am, pg_opfamily opf, pg_amop amop
980 WHERE opf.opfmethod = am.oid AND
981 amop.amopfamily = opf.oid
982 ORDER BY index_method, opfamily_name, opfamily_operator;
988 <sect1 id=
"indexes-examine">
989 <title>Examining Index Usage
</title>
991 <indexterm zone=
"indexes-examine">
992 <primary>index
</primary>
993 <secondary>examining usage
</secondary>
997 Although indexes in
<productname>PostgreSQL<
/> do not need
998 maintenance or tuning, it is still important to check
999 which indexes are actually used by the real-life query workload.
1000 Examining index usage for an individual query is done with the
1001 <xref linkend=
"sql-explain" endterm=
"sql-explain-title">
1002 command; its application for this purpose is
1003 illustrated in
<xref linkend=
"using-explain">.
1004 It is also possible to gather overall statistics about index usage
1005 in a running server, as described in
<xref linkend=
"monitoring-stats">.
1009 It is difficult to formulate a general procedure for determining
1010 which indexes to create. There are a number of typical cases that
1011 have been shown in the examples throughout the previous sections.
1012 A good deal of experimentation is often necessary.
1013 The rest of this section gives some tips for that:
1019 Always run
<xref linkend=
"sql-analyze" endterm=
"sql-analyze-title">
1021 collects statistics about the distribution of the values in the
1022 table. This information is required to estimate the number of rows
1023 returned by a query, which is needed by the planner to assign
1024 realistic costs to each possible query plan. In absence of any
1025 real statistics, some default values are assumed, which are
1026 almost certain to be inaccurate. Examining an application's
1027 index usage without having run
<command>ANALYZE
</command> is
1028 therefore a lost cause.
1034 Use real data for experimentation. Using test data for setting
1035 up indexes will tell you what indexes you need for the test data,
1040 It is especially fatal to use very small test data sets.
1041 While selecting
1000 out of
100000 rows could be a candidate for
1042 an index, selecting
1 out of
100 rows will hardly be, because the
1043 100 rows probably fit within a single disk page, and there
1044 is no plan that can beat sequentially fetching
1 disk page.
1048 Also be careful when making up test data, which is often
1049 unavoidable when the application is not yet in production.
1050 Values that are very similar, completely random, or inserted in
1051 sorted order will skew the statistics away from the distribution
1052 that real data would have.
1058 When indexes are not used, it can be useful for testing to force
1059 their use. There are run-time parameters that can turn off
1060 various plan types (see
<xref linkend=
"runtime-config-query-enable">).
1061 For instance, turning off sequential scans
1062 (
<varname>enable_seqscan<
/>) and nested-loop joins
1063 (
<varname>enable_nestloop<
/>), which are the most basic plans,
1064 will force the system to use a different plan. If the system
1065 still chooses a sequential scan or nested-loop join then there is
1066 probably a more fundamental reason why the index is not being
1067 used; for example, the query condition does not match the index.
1068 (What kind of query can use what kind of index is explained in
1069 the previous sections.)
1075 If forcing index usage does use the index, then there are two
1076 possibilities: Either the system is right and using the index is
1077 indeed not appropriate, or the cost estimates of the query plans
1078 are not reflecting reality. So you should time your query with
1079 and without indexes. The
<command>EXPLAIN ANALYZE
</command>
1080 command can be useful here.
1086 If it turns out that the cost estimates are wrong, there are,
1087 again, two possibilities. The total cost is computed from the
1088 per-row costs of each plan node times the selectivity estimate of
1089 the plan node. The costs estimated for the plan nodes can be adjusted
1090 via run-time parameters (described in
<xref
1091 linkend=
"runtime-config-query-constants">).
1092 An inaccurate selectivity estimate is due to
1093 insufficient statistics. It might be possible to improve this by
1094 tuning the statistics-gathering parameters (see
1095 <xref linkend=
"sql-altertable" endterm=
"sql-altertable-title">).
1099 If you do not succeed in adjusting the costs to be more
1100 appropriate, then you might have to resort to forcing index usage
1101 explicitly. You might also want to contact the
1102 <productname>PostgreSQL<
/> developers to examine the issue.