At update of non-LP_NORMAL TID, fail instead of corrupting page header.
[pgsql.git] / doc / src / sgml / parallel.sgml
blob1ce9abf86f52514b43f772e194f6868432d42df2
1 <!-- doc/src/sgml/parallel.sgml -->
3 <chapter id="parallel-query">
4 <title>Parallel Query</title>
6 <indexterm zone="parallel-query">
7 <primary>parallel query</primary>
8 </indexterm>
10 <para>
11 <productname>PostgreSQL</productname> can devise query plans that can leverage
12 multiple CPUs in order to answer queries faster. This feature is known
13 as parallel query. Many queries cannot benefit from parallel query, either
14 due to limitations of the current implementation or because there is no
15 imaginable query plan that is any faster than the serial query plan.
16 However, for queries that can benefit, the speedup from parallel query
17 is often very significant. Many queries can run more than twice as fast
18 when using parallel query, and some queries can run four times faster or
19 even more. Queries that touch a large amount of data but return only a
20 few rows to the user will typically benefit most. This chapter explains
21 some details of how parallel query works and in which situations it can be
22 used so that users who wish to make use of it can understand what to expect.
23 </para>
25 <sect1 id="how-parallel-query-works">
26 <title>How Parallel Query Works</title>
28 <para>
29 When the optimizer determines that parallel query is the fastest execution
30 strategy for a particular query, it will create a query plan that includes
31 a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
32 node. Here is a simple example:
34 <screen>
35 EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
36 QUERY PLAN
37 -------------------------------------------------------------------&zwsp;------------------
38 Gather (cost=1000.00..217018.43 rows=1 width=97)
39 Workers Planned: 2
40 -> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97)
41 Filter: (filler ~~ '%x%'::text)
42 (4 rows)
43 </screen>
44 </para>
46 <para>
47 In all cases, the <literal>Gather</literal> or
48 <literal>Gather Merge</literal> node will have exactly one
49 child plan, which is the portion of the plan that will be executed in
50 parallel. If the <literal>Gather</literal> or <literal>Gather Merge</literal> node is
51 at the very top of the plan tree, then the entire query will execute in
52 parallel. If it is somewhere else in the plan tree, then only the portion
53 of the plan below it will run in parallel. In the example above, the
54 query accesses only one table, so there is only one plan node other than
55 the <literal>Gather</literal> node itself; since that plan node is a child of the
56 <literal>Gather</literal> node, it will run in parallel.
57 </para>
59 <para>
60 <link linkend="using-explain">Using EXPLAIN</link>, you can see the number of
61 workers chosen by the planner. When the <literal>Gather</literal> node is reached
62 during query execution, the process that is implementing the user's
63 session will request a number of <link linkend="bgworker">background
64 worker processes</link> equal to the number
65 of workers chosen by the planner. The number of background workers that
66 the planner will consider using is limited to at most
67 <xref linkend="guc-max-parallel-workers-per-gather"/>. The total number
68 of background workers that can exist at any one time is limited by both
69 <xref linkend="guc-max-worker-processes"/> and
70 <xref linkend="guc-max-parallel-workers"/>. Therefore, it is possible for a
71 parallel query to run with fewer workers than planned, or even with
72 no workers at all. The optimal plan may depend on the number of workers
73 that are available, so this can result in poor query performance. If this
74 occurrence is frequent, consider increasing
75 <varname>max_worker_processes</varname> and <varname>max_parallel_workers</varname>
76 so that more workers can be run simultaneously or alternatively reducing
77 <varname>max_parallel_workers_per_gather</varname> so that the planner
78 requests fewer workers.
79 </para>
81 <para>
82 Every background worker process that is successfully started for a given
83 parallel query will execute the parallel portion of the plan. The leader
84 will also execute that portion of the plan, but it has an additional
85 responsibility: it must also read all of the tuples generated by the
86 workers. When the parallel portion of the plan generates only a small
87 number of tuples, the leader will often behave very much like an additional
88 worker, speeding up query execution. Conversely, when the parallel portion
89 of the plan generates a large number of tuples, the leader may be almost
90 entirely occupied with reading the tuples generated by the workers and
91 performing any further processing steps that are required by plan nodes
92 above the level of the <literal>Gather</literal> node or
93 <literal>Gather Merge</literal> node. In such cases, the leader will
94 do very little of the work of executing the parallel portion of the plan.
95 </para>
97 <para>
98 When the node at the top of the parallel portion of the plan is
99 <literal>Gather Merge</literal> rather than <literal>Gather</literal>, it indicates that
100 each process executing the parallel portion of the plan is producing
101 tuples in sorted order, and that the leader is performing an
102 order-preserving merge. In contrast, <literal>Gather</literal> reads tuples
103 from the workers in whatever order is convenient, destroying any sort
104 order that may have existed.
105 </para>
106 </sect1>
108 <sect1 id="when-can-parallel-query-be-used">
109 <title>When Can Parallel Query Be Used?</title>
111 <para>
112 There are several settings that can cause the query planner not to
113 generate a parallel query plan under any circumstances. In order for
114 any parallel query plans whatsoever to be generated, the following
115 settings must be configured as indicated.
116 </para>
118 <itemizedlist>
119 <listitem>
120 <para>
121 <xref linkend="guc-max-parallel-workers-per-gather"/> must be set to a
122 value that is greater than zero. This is a special case of the more
123 general principle that no more workers should be used than the number
124 configured via <varname>max_parallel_workers_per_gather</varname>.
125 </para>
126 </listitem>
127 </itemizedlist>
129 <para>
130 In addition, the system must not be running in single-user mode. Since
131 the entire database system is running as a single process in this situation,
132 no background workers will be available.
133 </para>
135 <para>
136 Even when it is in general possible for parallel query plans to be
137 generated, the planner will not generate them for a given query
138 if any of the following are true:
139 </para>
141 <itemizedlist>
142 <listitem>
143 <para>
144 The query writes any data or locks any database rows. If a query
145 contains a data-modifying operation either at the top level or within
146 a CTE, no parallel plans for that query will be generated. As an
147 exception, the following commands, which create a new table and populate
148 it, can use a parallel plan for the underlying <literal>SELECT</literal>
149 part of the query:
151 <itemizedlist>
152 <listitem>
153 <para><command>CREATE TABLE ... AS</command></para>
154 </listitem>
155 <listitem>
156 <para><command>SELECT INTO</command></para>
157 </listitem>
158 <listitem>
159 <para><command>CREATE MATERIALIZED VIEW</command></para>
160 </listitem>
161 <listitem>
162 <para><command>REFRESH MATERIALIZED VIEW</command></para>
163 </listitem>
164 </itemizedlist>
165 </para>
166 </listitem>
168 <listitem>
169 <para>
170 The query might be suspended during execution. In any situation in
171 which the system thinks that partial or incremental execution might
172 occur, no parallel plan is generated. For example, a cursor created
173 using <link linkend="sql-declare">DECLARE CURSOR</link> will never use
174 a parallel plan. Similarly, a PL/pgSQL loop of the form
175 <literal>FOR x IN query LOOP .. END LOOP</literal> will never use a
176 parallel plan, because the parallel query system is unable to verify
177 that the code in the loop is safe to execute while parallel query is
178 active.
179 </para>
180 </listitem>
182 <listitem>
183 <para>
184 The query uses any function marked <literal>PARALLEL UNSAFE</literal>.
185 Most system-defined functions are <literal>PARALLEL SAFE</literal>,
186 but user-defined functions are marked <literal>PARALLEL
187 UNSAFE</literal> by default. See the discussion of
188 <xref linkend="parallel-safety"/>.
189 </para>
190 </listitem>
192 <listitem>
193 <para>
194 The query is running inside of another query that is already parallel.
195 For example, if a function called by a parallel query issues an SQL
196 query itself, that query will never use a parallel plan. This is a
197 limitation of the current implementation, but it may not be desirable
198 to remove this limitation, since it could result in a single query
199 using a very large number of processes.
200 </para>
201 </listitem>
202 </itemizedlist>
204 <para>
205 Even when a parallel query plan is generated for a particular query, there
206 are several circumstances under which it will be impossible to execute
207 that plan in parallel at execution time. If this occurs, the leader
208 will execute the portion of the plan below the <literal>Gather</literal>
209 node entirely by itself, almost as if the <literal>Gather</literal> node were
210 not present. This will happen if any of the following conditions are met:
211 </para>
213 <itemizedlist>
214 <listitem>
215 <para>
216 No background workers can be obtained because of the limitation that
217 the total number of background workers cannot exceed
218 <xref linkend="guc-max-worker-processes"/>.
219 </para>
220 </listitem>
222 <listitem>
223 <para>
224 No background workers can be obtained because of the limitation that
225 the total number of background workers launched for purposes of
226 parallel query cannot exceed <xref linkend="guc-max-parallel-workers"/>.
227 </para>
228 </listitem>
230 <listitem>
231 <para>
232 The client sends an Execute message with a non-zero fetch count.
233 See the discussion of the
234 <link linkend="protocol-flow-ext-query">extended query protocol</link>.
235 Since <link linkend="libpq">libpq</link> currently provides no way to
236 send such a message, this can only occur when using a client that
237 does not rely on libpq. If this is a frequent
238 occurrence, it may be a good idea to set
239 <xref linkend="guc-max-parallel-workers-per-gather"/> to zero in
240 sessions where it is likely, so as to avoid generating query plans
241 that may be suboptimal when run serially.
242 </para>
243 </listitem>
244 </itemizedlist>
245 </sect1>
247 <sect1 id="parallel-plans">
248 <title>Parallel Plans</title>
250 <para>
251 Because each worker executes the parallel portion of the plan to
252 completion, it is not possible to simply take an ordinary query plan
253 and run it using multiple workers. Each worker would produce a full
254 copy of the output result set, so the query would not run any faster
255 than normal but would produce incorrect results. Instead, the parallel
256 portion of the plan must be what is known internally to the query
257 optimizer as a <firstterm>partial plan</firstterm>; that is, it must be constructed
258 so that each process that executes the plan will generate only a
259 subset of the output rows in such a way that each required output row
260 is guaranteed to be generated by exactly one of the cooperating processes.
261 Generally, this means that the scan on the driving table of the query
262 must be a parallel-aware scan.
263 </para>
265 <sect2 id="parallel-scans">
266 <title>Parallel Scans</title>
268 <para>
269 The following types of parallel-aware table scans are currently supported.
271 <itemizedlist>
272 <listitem>
273 <para>
274 In a <emphasis>parallel sequential scan</emphasis>, the table's blocks will
275 be divided into ranges and shared among the cooperating processes. Each
276 worker process will complete the scanning of its given range of blocks before
277 requesting an additional range of blocks.
278 </para>
279 </listitem>
280 <listitem>
281 <para>
282 In a <emphasis>parallel bitmap heap scan</emphasis>, one process is chosen
283 as the leader. That process performs a scan of one or more indexes
284 and builds a bitmap indicating which table blocks need to be visited.
285 These blocks are then divided among the cooperating processes as in
286 a parallel sequential scan. In other words, the heap scan is performed
287 in parallel, but the underlying index scan is not.
288 </para>
289 </listitem>
290 <listitem>
291 <para>
292 In a <emphasis>parallel index scan</emphasis> or <emphasis>parallel index-only
293 scan</emphasis>, the cooperating processes take turns reading data from the
294 index. Currently, parallel index scans are supported only for
295 btree indexes. Each process will claim a single index block and will
296 scan and return all tuples referenced by that block; other processes can
297 at the same time be returning tuples from a different index block.
298 The results of a parallel btree scan are returned in sorted order
299 within each worker process.
300 </para>
301 </listitem>
302 </itemizedlist>
304 Other scan types, such as scans of non-btree indexes, may support
305 parallel scans in the future.
306 </para>
307 </sect2>
309 <sect2 id="parallel-joins">
310 <title>Parallel Joins</title>
312 <para>
313 Just as in a non-parallel plan, the driving table may be joined to one or
314 more other tables using a nested loop, hash join, or merge join. The
315 inner side of the join may be any kind of non-parallel plan that is
316 otherwise supported by the planner provided that it is safe to run within
317 a parallel worker. Depending on the join type, the inner side may also be
318 a parallel plan.
319 </para>
321 <itemizedlist>
322 <listitem>
323 <para>
324 In a <emphasis>nested loop join</emphasis>, the inner side is always
325 non-parallel. Although it is executed in full, this is efficient if
326 the inner side is an index scan, because the outer tuples and thus
327 the loops that look up values in the index are divided over the
328 cooperating processes.
329 </para>
330 </listitem>
331 <listitem>
332 <para>
333 In a <emphasis>merge join</emphasis>, the inner side is always
334 a non-parallel plan and therefore executed in full. This may be
335 inefficient, especially if a sort must be performed, because the work
336 and resulting data are duplicated in every cooperating process.
337 </para>
338 </listitem>
339 <listitem>
340 <para>
341 In a <emphasis>hash join</emphasis> (without the "parallel" prefix),
342 the inner side is executed in full by every cooperating process
343 to build identical copies of the hash table. This may be inefficient
344 if the hash table is large or the plan is expensive. In a
345 <emphasis>parallel hash join</emphasis>, the inner side is a
346 <emphasis>parallel hash</emphasis> that divides the work of building
347 a shared hash table over the cooperating processes.
348 </para>
349 </listitem>
350 </itemizedlist>
351 </sect2>
353 <sect2 id="parallel-aggregation">
354 <title>Parallel Aggregation</title>
355 <para>
356 <productname>PostgreSQL</productname> supports parallel aggregation by aggregating in
357 two stages. First, each process participating in the parallel portion of
358 the query performs an aggregation step, producing a partial result for
359 each group of which that process is aware. This is reflected in the plan
360 as a <literal>Partial Aggregate</literal> node. Second, the partial results are
361 transferred to the leader via <literal>Gather</literal> or <literal>Gather
362 Merge</literal>. Finally, the leader re-aggregates the results across all
363 workers in order to produce the final result. This is reflected in the
364 plan as a <literal>Finalize Aggregate</literal> node.
365 </para>
367 <para>
368 Because the <literal>Finalize Aggregate</literal> node runs on the leader
369 process, queries that produce a relatively large number of groups in
370 comparison to the number of input rows will appear less favorable to the
371 query planner. For example, in the worst-case scenario the number of
372 groups seen by the <literal>Finalize Aggregate</literal> node could be as many as
373 the number of input rows that were seen by all worker processes in the
374 <literal>Partial Aggregate</literal> stage. For such cases, there is clearly
375 going to be no performance benefit to using parallel aggregation. The
376 query planner takes this into account during the planning process and is
377 unlikely to choose parallel aggregate in this scenario.
378 </para>
380 <para>
381 Parallel aggregation is not supported in all situations. Each aggregate
382 must be <link linkend="parallel-safety">safe</link> for parallelism and must
383 have a combine function. If the aggregate has a transition state of type
384 <literal>internal</literal>, it must have serialization and deserialization
385 functions. See <xref linkend="sql-createaggregate"/> for more details.
386 Parallel aggregation is not supported if any aggregate function call
387 contains <literal>DISTINCT</literal> or <literal>ORDER BY</literal> clause and is also
388 not supported for ordered set aggregates or when the query involves
389 <literal>GROUPING SETS</literal>. It can only be used when all joins involved in
390 the query are also part of the parallel portion of the plan.
391 </para>
393 </sect2>
395 <sect2 id="parallel-append">
396 <title>Parallel Append</title>
398 <para>
399 Whenever <productname>PostgreSQL</productname> needs to combine rows
400 from multiple sources into a single result set, it uses an
401 <literal>Append</literal> or <literal>MergeAppend</literal> plan node.
402 This commonly happens when implementing <literal>UNION ALL</literal> or
403 when scanning a partitioned table. Such nodes can be used in parallel
404 plans just as they can in any other plan. However, in a parallel plan,
405 the planner may instead use a <literal>Parallel Append</literal> node.
406 </para>
408 <para>
409 When an <literal>Append</literal> node is used in a parallel plan, each
410 process will execute the child plans in the order in which they appear,
411 so that all participating processes cooperate to execute the first child
412 plan until it is complete and then move to the second plan at around the
413 same time. When a <literal>Parallel Append</literal> is used instead, the
414 executor will instead spread out the participating processes as evenly as
415 possible across its child plans, so that multiple child plans are executed
416 simultaneously. This avoids contention, and also avoids paying the startup
417 cost of a child plan in those processes that never execute it.
418 </para>
420 <para>
421 Also, unlike a regular <literal>Append</literal> node, which can only have
422 partial children when used within a parallel plan, a <literal>Parallel
423 Append</literal> node can have both partial and non-partial child plans.
424 Non-partial children will be scanned by only a single process, since
425 scanning them more than once would produce duplicate results. Plans that
426 involve appending multiple result sets can therefore achieve
427 coarse-grained parallelism even when efficient partial plans are not
428 available. For example, consider a query against a partitioned table
429 that can only be implemented efficiently by using an index that does
430 not support parallel scans. The planner might choose a <literal>Parallel
431 Append</literal> of regular <literal>Index Scan</literal> plans; each
432 individual index scan would have to be executed to completion by a single
433 process, but different scans could be performed at the same time by
434 different processes.
435 </para>
437 <para>
438 <xref linkend="guc-enable-parallel-append" /> can be used to disable
439 this feature.
440 </para>
441 </sect2>
443 <sect2 id="parallel-plan-tips">
444 <title>Parallel Plan Tips</title>
446 <para>
447 If a query that is expected to do so does not produce a parallel plan,
448 you can try reducing <xref linkend="guc-parallel-setup-cost"/> or
449 <xref linkend="guc-parallel-tuple-cost"/>. Of course, this plan may turn
450 out to be slower than the serial plan that the planner preferred, but
451 this will not always be the case. If you don't get a parallel
452 plan even with very small values of these settings (e.g., after setting
453 them both to zero), there may be some reason why the query planner is
454 unable to generate a parallel plan for your query. See
455 <xref linkend="when-can-parallel-query-be-used"/> and
456 <xref linkend="parallel-safety"/> for information on why this may be
457 the case.
458 </para>
460 <para>
461 When executing a parallel plan, you can use <literal>EXPLAIN (ANALYZE,
462 VERBOSE)</literal> to display per-worker statistics for each plan node.
463 This may be useful in determining whether the work is being evenly
464 distributed between all plan nodes and more generally in understanding the
465 performance characteristics of the plan.
466 </para>
468 </sect2>
469 </sect1>
471 <sect1 id="parallel-safety">
472 <title>Parallel Safety</title>
474 <para>
475 The planner classifies operations involved in a query as either
476 <firstterm>parallel safe</firstterm>, <firstterm>parallel restricted</firstterm>,
477 or <firstterm>parallel unsafe</firstterm>. A parallel safe operation is one that
478 does not conflict with the use of parallel query. A parallel restricted
479 operation is one that cannot be performed in a parallel worker, but that
480 can be performed in the leader while parallel query is in use. Therefore,
481 parallel restricted operations can never occur below a <literal>Gather</literal>
482 or <literal>Gather Merge</literal> node, but can occur elsewhere in a plan that
483 contains such a node. A parallel unsafe operation is one that cannot
484 be performed while parallel query is in use, not even in the leader.
485 When a query contains anything that is parallel unsafe, parallel query
486 is completely disabled for that query.
487 </para>
489 <para>
490 The following operations are always parallel restricted:
491 </para>
493 <itemizedlist>
494 <listitem>
495 <para>
496 Scans of common table expressions (CTEs).
497 </para>
498 </listitem>
500 <listitem>
501 <para>
502 Scans of temporary tables.
503 </para>
504 </listitem>
506 <listitem>
507 <para>
508 Scans of foreign tables, unless the foreign data wrapper has
509 an <literal>IsForeignScanParallelSafe</literal> API that indicates otherwise.
510 </para>
511 </listitem>
513 <listitem>
514 <para>
515 Plan nodes that reference a correlated <literal>SubPlan</literal>.
516 </para>
517 </listitem>
518 </itemizedlist>
520 <sect2 id="parallel-labeling">
521 <title>Parallel Labeling for Functions and Aggregates</title>
523 <para>
524 The planner cannot automatically determine whether a user-defined
525 function or aggregate is parallel safe, parallel restricted, or parallel
526 unsafe, because this would require predicting every operation that the
527 function could possibly perform. In general, this is equivalent to the
528 Halting Problem and therefore impossible. Even for simple functions
529 where it could conceivably be done, we do not try, since this would be expensive
530 and error-prone. Instead, all user-defined functions are assumed to
531 be parallel unsafe unless otherwise marked. When using
532 <xref linkend="sql-createfunction"/> or
533 <xref linkend="sql-alterfunction"/>, markings can be set by specifying
534 <literal>PARALLEL SAFE</literal>, <literal>PARALLEL RESTRICTED</literal>, or
535 <literal>PARALLEL UNSAFE</literal> as appropriate. When using
536 <xref linkend="sql-createaggregate"/>, the
537 <literal>PARALLEL</literal> option can be specified with <literal>SAFE</literal>,
538 <literal>RESTRICTED</literal>, or <literal>UNSAFE</literal> as the corresponding value.
539 </para>
541 <para>
542 Functions and aggregates must be marked <literal>PARALLEL UNSAFE</literal>
543 if they write to the database, change the transaction state (other than by
544 using a subtransaction for error recovery), access sequences, or make
545 persistent changes to
546 settings. Similarly, functions must be marked <literal>PARALLEL
547 RESTRICTED</literal> if they access temporary tables, client connection state,
548 cursors, prepared statements, or miscellaneous backend-local state that
549 the system cannot synchronize across workers. For example,
550 <literal>setseed</literal> and <literal>random</literal> are parallel restricted for
551 this last reason.
552 </para>
554 <para>
555 In general, if a function is labeled as being safe when it is restricted or
556 unsafe, or if it is labeled as being restricted when it is in fact unsafe,
557 it may throw errors or produce wrong answers when used in a parallel query.
558 C-language functions could in theory exhibit totally undefined behavior if
559 mislabeled, since there is no way for the system to protect itself against
560 arbitrary C code, but in most likely cases the result will be no worse than
561 for any other function. If in doubt, it is probably best to label functions
562 as <literal>UNSAFE</literal>.
563 </para>
565 <para>
566 If a function executed within a parallel worker acquires locks that are
567 not held by the leader, for example by querying a table not referenced in
568 the query, those locks will be released at worker exit, not end of
569 transaction. If you write a function that does this, and this behavior
570 difference is important to you, mark such functions as
571 <literal>PARALLEL RESTRICTED</literal>
572 to ensure that they execute only in the leader.
573 </para>
575 <para>
576 Note that the query planner does not consider deferring the evaluation of
577 parallel-restricted functions or aggregates involved in the query in
578 order to obtain a superior plan. So, for example, if a <literal>WHERE</literal>
579 clause applied to a particular table is parallel restricted, the query
580 planner will not consider performing a scan of that table in the parallel
581 portion of a plan. In some cases, it would be
582 possible (and perhaps even efficient) to include the scan of that table in
583 the parallel portion of the query and defer the evaluation of the
584 <literal>WHERE</literal> clause so that it happens above the <literal>Gather</literal>
585 node. However, the planner does not do this.
586 </para>
588 </sect2>
590 </sect1>
592 </chapter>