forked from postgres/postgres
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathparallel.sgml
544 lines (484 loc) · 23.8 KB
/
parallel.sgml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
<!-- doc/src/sgml/parallel.sgml -->
<chapter id="parallel-query">
<title>Parallel Query</title>
<indexterm zone="parallel-query">
<primary>parallel query</primary>
</indexterm>
<para>
<productname>PostgreSQL</> can devise query plans which can leverage
multiple CPUs in order to answer queries faster. This feature is known
as parallel query. Many queries cannot benefit from parallel query, either
due to limitations of the current implementation or because there is no
imaginable query plan which is any faster than the serial query plan.
However, for queries that can benefit, the speedup from parallel query
is often very significant. Many queries can run more than twice as fast
when using parallel query, and some queries can run four times faster or
even more. Queries that touch a large amount of data but return only a
few rows to the user will typically benefit most. This chapter explains
some details of how parallel query works and in which situations it can be
used so that users who wish to make use of it can understand what to expect.
</para>
<sect1 id="how-parallel-query-works">
<title>How Parallel Query Works</title>
<para>
When the optimizer determines that parallel query is the fastest execution
strategy for a particular query, it will create a query plan which includes
a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
node. Here is a simple example:
<screen>
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
QUERY PLAN
-------------------------------------------------------------------------------------
Gather (cost=1000.00..217018.43 rows=1 width=97)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97)
Filter: (filler ~~ '%x%'::text)
(4 rows)
</screen>
</para>
<para>
In all cases, the <literal>Gather</literal> or
<literal>Gather Merge</literal> node will have exactly one
child plan, which is the portion of the plan that will be executed in
parallel. If the <literal>Gather</> or <literal>Gather Merge</> node is
at the very top of the plan tree, then the entire query will execute in
parallel. If it is somewhere else in the plan tree, then only the portion
of the plan below it will run in parallel. In the example above, the
query accesses only one table, so there is only one plan node other than
the <literal>Gather</> node itself; since that plan node is a child of the
<literal>Gather</> node, it will run in parallel.
</para>
<para>
<link linkend="using-explain">Using EXPLAIN</>, you can see the number of
workers chosen by the planner. When the <literal>Gather</> node is reached
during query execution, the process which is implementing the user's
session will request a number of <link linkend="bgworker">background
worker processes</link> equal to the number
of workers chosen by the planner. The number of background workers that
the planner will consider using is limited to at most
<xref linkend="guc-max-parallel-workers-per-gather">. The total number
of background workers that can exist at any one time is limited by both
<xref linkend="guc-max-worker-processes"> and
<xref linkend="guc-max-parallel-workers">. Therefore, it is possible for a
parallel query to run with fewer workers than planned, or even with
no workers at all. The optimal plan may depend on the number of workers
that are available, so this can result in poor query performance. If this
occurrence is frequent, consider increasing
<varname>max_worker_processes</> and <varname>max_parallel_workers</>
so that more workers can be run simultaneously or alternatively reducing
<varname>max_parallel_workers_per_gather</varname> so that the planner
requests fewer workers.
</para>
<para>
Every background worker process which is successfully started for a given
parallel query will execute the parallel portion of the plan. The leader
will also execute that portion of the plan, but it has an additional
responsibility: it must also read all of the tuples generated by the
workers. When the parallel portion of the plan generates only a small
number of tuples, the leader will often behave very much like an additional
worker, speeding up query execution. Conversely, when the parallel portion
of the plan generates a large number of tuples, the leader may be almost
entirely occupied with reading the tuples generated by the workers and
performing any further processing steps which are required by plan nodes
above the level of the <literal>Gather</literal> node or
<literal>Gather Merge</literal> node. In such cases, the leader will
do very little of the work of executing the parallel portion of the plan.
</para>
<para>
When the node at the top of the parallel portion of the plan is
<literal>Gather Merge</> rather than <literal>Gather</>, it indicates that
each process executing the parallel portion of the plan is producing
tuples in sorted order, and that the leader is performing an
order-preserving merge. In contrast, <literal>Gather</> reads tuples
from the workers in whatever order is convenient, destroying any sort
order that may have existed.
</para>
</sect1>
<sect1 id="when-can-parallel-query-be-used">
<title>When Can Parallel Query Be Used?</title>
<para>
There are several settings which can cause the query planner not to
generate a parallel query plan under any circumstances. In order for
any parallel query plans whatsoever to be generated, the following
settings must be configured as indicated.
</para>
<itemizedlist>
<listitem>
<para>
<xref linkend="guc-max-parallel-workers-per-gather"> must be set to a
value which is greater than zero. This is a special case of the more
general principle that no more workers should be used than the number
configured via <varname>max_parallel_workers_per_gather</varname>.
</para>
</listitem>
<listitem>
<para>
<xref linkend="guc-dynamic-shared-memory-type"> must be set to a
value other than <literal>none</>. Parallel query requires dynamic
shared memory in order to pass data between cooperating processes.
</para>
</listitem>
</itemizedlist>
<para>
In addition, the system must not be running in single-user mode. Since
the entire database system is running in single process in this situation,
no background workers will be available.
</para>
<para>
Even when it is in general possible for parallel query plans to be
generated, the planner will not generate them for a given query
if any of the following are true:
</para>
<itemizedlist>
<listitem>
<para>
The query writes any data or locks any database rows. If a query
contains a data-modifying operation either at the top level or within
a CTE, no parallel plans for that query will be generated. This is a
limitation of the current implementation which could be lifted in a
future release.
</para>
</listitem>
<listitem>
<para>
The query might be suspended during execution. In any situation in
which the system thinks that partial or incremental execution might
occur, no parallel plan is generated. For example, a cursor created
using <link linkend="sql-declare">DECLARE CURSOR</link> will never use
a parallel plan. Similarly, a PL/pgSQL loop of the form
<literal>FOR x IN query LOOP .. END LOOP</literal> will never use a
parallel plan, because the parallel query system is unable to verify
that the code in the loop is safe to execute while parallel query is
active.
</para>
</listitem>
<listitem>
<para>
The query uses any function marked <literal>PARALLEL UNSAFE</literal>.
Most system-defined functions are <literal>PARALLEL SAFE</literal>,
but user-defined functions are marked <literal>PARALLEL
UNSAFE</literal> by default. See the discussion of
<xref linkend="parallel-safety">.
</para>
</listitem>
<listitem>
<para>
The query is running inside of another query that is already parallel.
For example, if a function called by a parallel query issues an SQL
query itself, that query will never use a parallel plan. This is a
limitation of the current implementation, but it may not be desirable
to remove this limitation, since it could result in a single query
using a very large number of processes.
</para>
</listitem>
<listitem>
<para>
The transaction isolation level is serializable. This is
a limitation of the current implementation.
</para>
</listitem>
</itemizedlist>
<para>
Even when parallel query plan is generated for a particular query, there
are several circumstances under which it will be impossible to execute
that plan in parallel at execution time. If this occurs, the leader
will execute the portion of the plan below the <literal>Gather</>
node entirely by itself, almost as if the <literal>Gather</> node were
not present. This will happen if any of the following conditions are met:
</para>
<itemizedlist>
<listitem>
<para>
No background workers can be obtained because of the limitation that
the total number of background workers cannot exceed
<xref linkend="guc-max-worker-processes">.
</para>
</listitem>
<listitem>
<para>
No background workers can be obtained because of the limitation that
the total number of background workers launched for purposes of
parallel query cannot exceed <xref linkend="guc-max-parallel-workers">.
</para>
</listitem>
<listitem>
<para>
The client sends an Execute message with a non-zero fetch count.
See the discussion of the
<link linkend="protocol-flow-ext-query">extended query protocol</link>.
Since <link linkend="libpq">libpq</link> currently provides no way to
send such a message, this can only occur when using a client that
does not rely on libpq. If this is a frequent
occurrence, it may be a good idea to set
<xref linkend="guc-max-parallel-workers-per-gather"> to zero in
sessions where it is likely, so as to avoid generating query plans
that may be suboptimal when run serially.
</para>
</listitem>
<listitem>
<para>
A prepared statement is executed using a <literal>CREATE TABLE .. AS
EXECUTE ..</literal> statement. This construct converts what otherwise
would have been a read-only operation into a read-write operation,
making it ineligible for parallel query.
</para>
</listitem>
<listitem>
<para>
The transaction isolation level is serializable. This situation
does not normally arise, because parallel query plans are not
generated when the transaction isolation level is serializable.
However, it can happen if the transaction isolation level is changed to
serializable after the plan is generated and before it is executed.
</para>
</listitem>
</itemizedlist>
</sect1>
<sect1 id="parallel-plans">
<title>Parallel Plans</title>
<para>
Because each worker executes the parallel portion of the plan to
completion, it is not possible to simply take an ordinary query plan
and run it using multiple workers. Each worker would produce a full
copy of the output result set, so the query would not run any faster
than normal but would produce incorrect results. Instead, the parallel
portion of the plan must be what is known internally to the query
optimizer as a <firstterm>partial plan</>; that is, it must be constructed
so that each process which executes the plan will generate only a
subset of the output rows in such a way that each required output row
is guaranteed to be generated by exactly one of the cooperating processes.
Generally, this means that the scan on the driving table of the query
must be a parallel-aware scan.
</para>
<sect2 id="parallel-scans">
<title>Parallel Scans</title>
<para>
The following types of parallel-aware table scans are currently supported.
<itemizedlist>
<listitem>
<para>
In a <emphasis>parallel sequential scan</>, the table's blocks will
be divided among the cooperating processes. Blocks are handed out one
at a time, so that access to the table remains sequential.
</para>
</listitem>
<listitem>
<para>
In a <emphasis>parallel bitmap heap scan</>, one process is chosen
as the leader. That process performs a scan of one or more indexes
and builds a bitmap indicating which table blocks need to be visited.
These blocks are then divided among the cooperating processes as in
a parallel sequential scan. In other words, the heap scan is performed
in parallel, but the underlying index scan is not.
</para>
</listitem>
<listitem>
<para>
In a <emphasis>parallel index scan</> or <emphasis>parallel index-only
scan</>, the cooperating processes take turns reading data from the
index. Currently, parallel index scans are supported only for
btree indexes. Each process will claim a single index block and will
scan and return all tuples referenced by that block; other process can
at the same time be returning tuples from a different index block.
The results of a parallel btree scan are returned in sorted order
within each worker process.
</para>
</listitem>
</itemizedlist>
Other scan types, such as scans of non-btree indexes, may support
parallel scans in the future.
</para>
</sect2>
<sect2 id="parallel-joins">
<title>Parallel Joins</title>
<para>
Just as in a non-parallel plan, the driving table may be joined to one or
more other tables using a nested loop, hash join, or merge join. The
inner side of the join may be any kind of non-parallel plan that is
otherwise supported by the planner provided that it is safe to run within
a parallel worker. For example, if a nested loop join is chosen, the
inner plan may be an index scan which looks up a value taken from the outer
side of the join.
</para>
<para>
Each worker will execute the inner side of the join in full. This is
typically not a problem for nested loops, but may be inefficient for
cases involving hash or merge joins. For example, for a hash join, this
restriction means that an identical hash table is built in each worker
process, which works fine for joins against small tables but may not be
efficient when the inner table is large. For a merge join, it might mean
that each worker performs a separate sort of the inner relation, which
could be slow. Of course, in cases where a parallel plan of this type
would be inefficient, the query planner will normally choose some other
plan (possibly one which does not use parallelism) instead.
</para>
</sect2>
<sect2 id="parallel-aggregation">
<title>Parallel Aggregation</title>
<para>
<productname>PostgreSQL</> supports parallel aggregation by aggregating in
two stages. First, each process participating in the parallel portion of
the query performs an aggregation step, producing a partial result for
each group of which that process is aware. This is reflected in the plan
as a <literal>Partial Aggregate</> node. Second, the partial results are
transferred to the leader via <literal>Gather</> or <literal>Gather
Merge</>. Finally, the leader re-aggregates the results across all
workers in order to produce the final result. This is reflected in the
plan as a <literal>Finalize Aggregate</> node.
</para>
<para>
Because the <literal>Finalize Aggregate</> node runs on the leader
process, queries which produce a relatively large number of groups in
comparison to the number of input rows will appear less favorable to the
query planner. For example, in the worst-case scenario the number of
groups seen by the <literal>Finalize Aggregate</> node could be as many as
the number of input rows which were seen by all worker processes in the
<literal>Partial Aggregate</> stage. For such cases, there is clearly
going to be no performance benefit to using parallel aggregation. The
query planner takes this into account during the planning process and is
unlikely to choose parallel aggregate in this scenario.
</para>
<para>
Parallel aggregation is not supported in all situations. Each aggregate
must be <link linkend="parallel-safety">safe</> for parallelism and must
have a combine function. If the aggregate has a transition state of type
<literal>internal</>, it must have serialization and deserialization
functions. See <xref linkend="sql-createaggregate"> for more details.
Parallel aggregation is not supported if any aggregate function call
contains <literal>DISTINCT</> or <literal>ORDER BY</> clause and is also
not supported for ordered set aggregates or when the query involves
<literal>GROUPING SETS</>. It can only be used when all joins involved in
the query are also part of the parallel portion of the plan.
</para>
</sect2>
<sect2 id="parallel-plan-tips">
<title>Parallel Plan Tips</title>
<para>
If a query that is expected to do so does not produce a parallel plan,
you can try reducing <xref linkend="guc-parallel-setup-cost"> or
<xref linkend="guc-parallel-tuple-cost">. Of course, this plan may turn
out to be slower than the serial plan which the planner preferred, but
this will not always be the case. If you don't get a parallel
plan even with very small values of these settings (e.g. after setting
them both to zero), there may be some reason why the query planner is
unable to generate a parallel plan for your query. See
<xref linkend="when-can-parallel-query-be-used"> and
<xref linkend="parallel-safety"> for information on why this may be
the case.
</para>
<para>
When executing a parallel plan, you can use <literal>EXPLAIN (ANALYZE,
VERBOSE)</literal> to display per-worker statistics for each plan node.
This may be useful in determining whether the work is being evenly
distributed between all plan nodes and more generally in understanding the
performance characteristics of the plan.
</para>
</sect2>
</sect1>
<sect1 id="parallel-safety">
<title>Parallel Safety</title>
<para>
The planner classifies operations involved in a query as either
<firstterm>parallel safe</>, <firstterm>parallel restricted</>,
or <firstterm>parallel unsafe</>. A parallel safe operation is one which
does not conflict with the use of parallel query. A parallel restricted
operation is one which cannot be performed in a parallel worker, but which
can be performed in the leader while parallel query is in use. Therefore,
parallel restricted operations can never occur below a <literal>Gather</>
or <literal>Gather Merge</> node, but can occur elsewhere in a plan which
contains such a node. A parallel unsafe operation is one which cannot
be performed while parallel query is in use, not even in the leader.
When a query contains anything which is parallel unsafe, parallel query
is completely disabled for that query.
</para>
<para>
The following operations are always parallel restricted.
</para>
<itemizedlist>
<listitem>
<para>
Scans of common table expressions (CTEs).
</para>
</listitem>
<listitem>
<para>
Scans of temporary tables.
</para>
</listitem>
<listitem>
<para>
Scans of foreign tables, unless the foreign data wrapper has
an <literal>IsForeignScanParallelSafe</> API which indicates otherwise.
</para>
</listitem>
<listitem>
<para>
Access to an <literal>InitPlan</> or correlated <literal>SubPlan</>.
</para>
</listitem>
</itemizedlist>
<sect2 id="parallel-labeling">
<title>Parallel Labeling for Functions and Aggregates</title>
<para>
The planner cannot automatically determine whether a user-defined
function or aggregate is parallel safe, parallel restricted, or parallel
unsafe, because this would require predicting every operation which the
function could possibly perform. In general, this is equivalent to the
Halting Problem and therefore impossible. Even for simple functions
where it conceivably be done, we do not try, since this would be expensive
and error-prone. Instead, all user-defined functions are assumed to
be parallel unsafe unless otherwise marked. When using
<xref linkend="sql-createfunction"> or
<xref linkend="sql-alterfunction">, markings can be set by specifying
<literal>PARALLEL SAFE</>, <literal>PARALLEL RESTRICTED</>, or
<literal>PARALLEL UNSAFE</> as appropriate. When using
<xref linkend="sql-createaggregate">, the
<literal>PARALLEL</> option can be specified with <literal>SAFE</>,
<literal>RESTRICTED</>, or <literal>UNSAFE</> as the corresponding value.
</para>
<para>
Functions and aggregates must be marked <literal>PARALLEL UNSAFE</> if
they write to the database, access sequences, change the transaction state
even temporarily (e.g. a PL/pgSQL function which establishes an
<literal>EXCEPTION</> block to catch errors), or make persistent changes to
settings. Similarly, functions must be marked <literal>PARALLEL
RESTRICTED</> if they access temporary tables, client connection state,
cursors, prepared statements, or miscellaneous backend-local state which
the system cannot synchronize across workers. For example,
<literal>setseed</> and <literal>random</> are parallel restricted for
this last reason.
</para>
<para>
In general, if a function is labeled as being safe when it is restricted or
unsafe, or if it is labeled as being restricted when it is in fact unsafe,
it may throw errors or produce wrong answers when used in a parallel query.
C-language functions could in theory exhibit totally undefined behavior if
mislabeled, since there is no way for the system to protect itself against
arbitrary C code, but in most likely cases the result will be no worse than
for any other function. If in doubt, it is probably best to label functions
as <literal>UNSAFE</>.
</para>
<para>
If a function executed within a parallel worker acquires locks which are
not held by the leader, for example by querying a table not referenced in
the query, those locks will be released at worker exit, not end of
transaction. If you write a function which does this, and this behavior
difference is important to you, mark such functions as
<literal>PARALLEL RESTRICTED</literal>
to ensure that they execute only in the leader.
</para>
<para>
Note that the query planner does not consider deferring the evaluation of
parallel-restricted functions or aggregates involved in the query in
order to obtain a superior plan. So, for example, if a <literal>WHERE</>
clause applied to a particular table is parallel restricted, the query
planner will not consider performing a scan of that table in the parallel
portion of a plan. In some cases, it would be
possible (and perhaps even efficient) to include the scan of that table in
the parallel portion of the query and defer the evaluation of the
<literal>WHERE</> clause so that it happens above the <literal>Gather</>
node. However, the planner does not do this.
</para>
</sect2>
</sect1>
</chapter>