Repair memory leaks in plpython.
[pgsql.git] / doc / src / sgml / ref / analyze.sgml
blobec81f00fecf87d4fc3472470ce9be5e1f8900265
1 <!--
2 doc/src/sgml/ref/analyze.sgml
3 PostgreSQL documentation
4 -->
6 <refentry id="sql-analyze">
7 <indexterm zone="sql-analyze">
8 <primary>ANALYZE</primary>
9 </indexterm>
11 <refmeta>
12 <refentrytitle>ANALYZE</refentrytitle>
13 <manvolnum>7</manvolnum>
14 <refmiscinfo>SQL - Language Statements</refmiscinfo>
15 </refmeta>
17 <refnamediv>
18 <refname>ANALYZE</refname>
19 <refpurpose>collect statistics about a database</refpurpose>
20 </refnamediv>
22 <refsynopsisdiv>
23 <synopsis>
24 ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
26 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
28 VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
29 SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
30 BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
32 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
34 [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
35 </synopsis>
36 </refsynopsisdiv>
38 <refsect1>
39 <title>Description</title>
41 <para>
42 <command>ANALYZE</command> collects statistics about the contents
43 of tables in the database, and stores the results in the <link
44 linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
45 system catalog. Subsequently, the query planner uses these
46 statistics to help determine the most efficient execution plans for
47 queries.
48 </para>
50 <para>
51 Without a <replaceable class="parameter">table_and_columns</replaceable>
52 list, <command>ANALYZE</command> processes every table and materialized view
53 in the current database that the current user has permission to analyze.
54 With a list, <command>ANALYZE</command> processes only those table(s).
55 It is further possible to give a list of column names for a table,
56 in which case only the statistics for those columns are collected.
57 </para>
58 </refsect1>
60 <refsect1>
61 <title>Parameters</title>
63 <variablelist>
64 <varlistentry>
65 <term><literal>VERBOSE</literal></term>
66 <listitem>
67 <para>
68 Enables display of progress messages at <literal>INFO</literal> level.
69 </para>
70 </listitem>
71 </varlistentry>
73 <varlistentry>
74 <term><literal>SKIP_LOCKED</literal></term>
75 <listitem>
76 <para>
77 Specifies that <command>ANALYZE</command> should not wait for any
78 conflicting locks to be released when beginning work on a relation:
79 if a relation cannot be locked immediately without waiting, the relation
80 is skipped. Note that even with this option, <command>ANALYZE</command>
81 may still block when opening the relation's indexes or when acquiring
82 sample rows from partitions, table inheritance children, and some
83 types of foreign tables. Also, while <command>ANALYZE</command>
84 ordinarily processes all partitions of specified partitioned tables,
85 this option will cause <command>ANALYZE</command> to skip all
86 partitions if there is a conflicting lock on the partitioned table.
87 </para>
88 </listitem>
89 </varlistentry>
91 <varlistentry>
92 <term><literal>BUFFER_USAGE_LIMIT</literal></term>
93 <listitem>
94 <para>
95 Specifies the
96 <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>
97 ring buffer size for <command>ANALYZE</command>. This size is used to
98 calculate the number of shared buffers which will be reused as part of
99 this strategy. <literal>0</literal> disables use of a
100 <literal>Buffer Access Strategy</literal>. When this option is not
101 specified, <command>ANALYZE</command> uses the value from
102 <xref linkend="guc-vacuum-buffer-usage-limit"/>. Higher settings can
103 allow <command>ANALYZE</command> to run more quickly, but having too
104 large a setting may cause too many other useful pages to be evicted from
105 shared buffers. The minimum value is <literal>128 kB</literal> and the
106 maximum value is <literal>16 GB</literal>.
107 </para>
108 </listitem>
109 </varlistentry>
111 <varlistentry>
112 <term><replaceable class="parameter">boolean</replaceable></term>
113 <listitem>
114 <para>
115 Specifies whether the selected option should be turned on or off.
116 You can write <literal>TRUE</literal>, <literal>ON</literal>, or
117 <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
118 <literal>OFF</literal>, or <literal>0</literal> to disable it. The
119 <replaceable class="parameter">boolean</replaceable> value can also
120 be omitted, in which case <literal>TRUE</literal> is assumed.
121 </para>
122 </listitem>
123 </varlistentry>
125 <varlistentry>
126 <term><replaceable class="parameter">size</replaceable></term>
127 <listitem>
128 <para>
129 Specifies an amount of memory in kilobytes. Sizes may also be specified
130 as a string containing the numerical size followed by any one of the
131 following memory units: <literal>B</literal> (bytes),
132 <literal>kB</literal> (kilobytes), <literal>MB</literal> (megabytes),
133 <literal>GB</literal> (gigabytes), or <literal>TB</literal> (terabytes).
134 </para>
135 </listitem>
136 </varlistentry>
138 <varlistentry>
139 <term><replaceable class="parameter">table_name</replaceable></term>
140 <listitem>
141 <para>
142 The name (possibly schema-qualified) of a specific table to
143 analyze. If omitted, all regular tables, partitioned tables, and
144 materialized views in the current database are analyzed (but not
145 foreign tables). If <literal>ONLY</literal> is specified before
146 the table name, only that table is analyzed. If <literal>ONLY</literal>
147 is not specified, the table and all its inheritance child tables or
148 partitions (if any) are analyzed. Optionally, <literal>*</literal>
149 can be specified after the table name to explicitly indicate that
150 inheritance child tables (or partitions) are to be analyzed.
151 </para>
152 </listitem>
153 </varlistentry>
155 <varlistentry>
156 <term><replaceable class="parameter">column_name</replaceable></term>
157 <listitem>
158 <para>
159 The name of a specific column to analyze. Defaults to all columns.
160 </para>
161 </listitem>
162 </varlistentry>
163 </variablelist>
164 </refsect1>
166 <refsect1>
167 <title>Outputs</title>
169 <para>
170 When <literal>VERBOSE</literal> is specified, <command>ANALYZE</command> emits
171 progress messages to indicate which table is currently being
172 processed. Various statistics about the tables are printed as well.
173 </para>
174 </refsect1>
176 <refsect1>
177 <title>Notes</title>
179 <para>
180 To analyze a table, one must ordinarily have the <literal>MAINTAIN</literal>
181 privilege on the table. However, database owners are allowed to
182 analyze all tables in their databases, except shared catalogs.
183 <command>ANALYZE</command> will skip over any tables that the calling user
184 does not have permission to analyze.
185 </para>
187 <para>
188 Foreign tables are analyzed only when explicitly selected. Not all
189 foreign data wrappers support <command>ANALYZE</command>. If the table's
190 wrapper does not support <command>ANALYZE</command>, the command prints a
191 warning and does nothing.
192 </para>
194 <para>
195 In the default <productname>PostgreSQL</productname> configuration,
196 the autovacuum daemon (see <xref linkend="autovacuum"/>)
197 takes care of automatic analyzing of tables when they are first loaded
198 with data, and as they change throughout regular operation.
199 When autovacuum is disabled,
200 it is a good idea to run <command>ANALYZE</command> periodically, or
201 just after making major changes in the contents of a table. Accurate
202 statistics will help the planner to choose the most appropriate query
203 plan, and thereby improve the speed of query processing. A common
204 strategy for read-mostly databases is to run <link linkend="sql-vacuum"><command>VACUUM</command></link>
205 and <command>ANALYZE</command> once a day during a low-usage time of day.
206 (This will not be sufficient if there is heavy update activity.)
207 </para>
209 <para>
210 While <command>ANALYZE</command> is running, the <xref
211 linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
212 pg_temp</literal>.
213 </para>
215 <para>
216 <command>ANALYZE</command>
217 requires only a read lock on the target table, so it can run in
218 parallel with other non-DDL activity on the table.
219 </para>
221 <para>
222 The statistics collected by <command>ANALYZE</command> usually
223 include a list of some of the most common values in each column and
224 a histogram showing the approximate data distribution in each
225 column. One or both of these can be omitted if
226 <command>ANALYZE</command> deems them uninteresting (for example,
227 in a unique-key column, there are no common values) or if the
228 column data type does not support the appropriate operators. There
229 is more information about the statistics in <xref
230 linkend="maintenance"/>.
231 </para>
233 <para>
234 For large tables, <command>ANALYZE</command> takes a random sample
235 of the table contents, rather than examining every row. This
236 allows even very large tables to be analyzed in a small amount of
237 time. Note, however, that the statistics are only approximate, and
238 will change slightly each time <command>ANALYZE</command> is run,
239 even if the actual table contents did not change. This might result
240 in small changes in the planner's estimated costs shown by
241 <link linkend="sql-explain"><command>EXPLAIN</command></link>.
242 In rare situations, this non-determinism will cause the planner's
243 choices of query plans to change after <command>ANALYZE</command> is run.
244 To avoid this, raise the amount of statistics collected by
245 <command>ANALYZE</command>, as described below.
246 </para>
248 <para>
249 The extent of analysis can be controlled by adjusting the
250 <xref linkend="guc-default-statistics-target"/> configuration variable, or
251 on a column-by-column basis by setting the per-column statistics
252 target with <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET
253 STATISTICS</command></link>.
254 The target value sets the
255 maximum number of entries in the most-common-value list and the
256 maximum number of bins in the histogram. The default target value
257 is 100, but this can be adjusted up or down to trade off accuracy of
258 planner estimates against the time taken for
259 <command>ANALYZE</command> and the amount of space occupied in
260 <literal>pg_statistic</literal>. In particular, setting the
261 statistics target to zero disables collection of statistics for
262 that column. It might be useful to do that for columns that are
263 never used as part of the <literal>WHERE</literal>, <literal>GROUP BY</literal>,
264 or <literal>ORDER BY</literal> clauses of queries, since the planner will
265 have no use for statistics on such columns.
266 </para>
268 <para>
269 The largest statistics target among the columns being analyzed determines
270 the number of table rows sampled to prepare the statistics. Increasing
271 the target causes a proportional increase in the time and space needed
272 to do <command>ANALYZE</command>.
273 </para>
275 <para>
276 One of the values estimated by <command>ANALYZE</command> is the number of
277 distinct values that appear in each column. Because only a subset of the
278 rows are examined, this estimate can sometimes be quite inaccurate, even
279 with the largest possible statistics target. If this inaccuracy leads to
280 bad query plans, a more accurate value can be determined manually and then
281 installed with
282 <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...)</command></link>.
283 </para>
285 <para>
286 If the table being analyzed has inheritance children,
287 <command>ANALYZE</command> gathers two sets of statistics: one on the rows
288 of the parent table only, and a second including rows of both the parent
289 table and all of its children. This second set of statistics is needed when
290 planning queries that process the inheritance tree as a whole. The
291 autovacuum daemon, however, will only consider inserts or updates on the
292 parent table itself when deciding whether to trigger an automatic analyze
293 for that table. If that table is rarely inserted into or updated, the
294 inheritance statistics will not be up to date unless you run
295 <command>ANALYZE</command> manually. By default,
296 <command>ANALYZE</command> will also recursively collect and update the
297 statistics for each inheritance child table. The <literal>ONLY</literal>
298 keyword may be used to disable this.
299 </para>
301 <para>
302 For partitioned tables, <command>ANALYZE</command> gathers statistics by
303 sampling rows from all partitions. By default,
304 <command>ANALYZE</command> will also recursively collect and update the
305 statistics for each partition. The <literal>ONLY</literal> keyword may be
306 used to disable this.
307 </para>
309 <para>
310 The autovacuum daemon does not process partitioned tables, nor does it
311 process inheritance parents if only the children are ever modified.
312 It is usually necessary to periodically run a manual
313 <command>ANALYZE</command> to keep the statistics of the table hierarchy
314 up to date.
315 </para>
317 <para>
318 If any child tables or partitions are foreign tables whose foreign
319 data wrappers do not support <command>ANALYZE</command>, those tables are
320 ignored while gathering inheritance statistics.
321 </para>
323 <para>
324 If the table being analyzed is completely empty, <command>ANALYZE</command>
325 will not record new statistics for that table. Any existing statistics
326 will be retained.
327 </para>
329 <para>
330 Each backend running <command>ANALYZE</command> will report its progress
331 in the <structname>pg_stat_progress_analyze</structname> view. See
332 <xref linkend="analyze-progress-reporting"/> for details.
333 </para>
334 </refsect1>
336 <refsect1>
337 <title>Compatibility</title>
339 <para>
340 There is no <command>ANALYZE</command> statement in the SQL standard.
341 </para>
343 <para>
344 The following syntax was used before <productname>PostgreSQL</productname>
345 version 11 and is still supported:
346 <synopsis>
347 ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
348 </synopsis>
349 </para>
350 </refsect1>
352 <refsect1>
353 <title>See Also</title>
355 <simplelist type="inline">
356 <member><xref linkend="sql-vacuum"/></member>
357 <member><xref linkend="app-vacuumdb"/></member>
358 <member><xref linkend="runtime-config-resource-vacuum-cost"/></member>
359 <member><xref linkend="autovacuum"/></member>
360 <member><xref linkend="analyze-progress-reporting"/></member>
361 </simplelist>
362 </refsect1>
363 </refentry>