Skip to content

Commit 1ea5bdb

Browse files
committed
Improve planner's estimates of tuple hash table sizes.
For several types of plan nodes that use TupleHashTables, the planner estimated the expected size of the table as basically numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)). This is pretty far off, especially for small data widths, because it doesn't account for the overhead of the simplehash.h hash table nor for any per-tuple "additional space" the plan node may request. Jeff Janes noted a case where the estimate was off by about a factor of three, even though the obvious hazards such as inaccurate estimates of numEntries or dataWidth didn't apply. To improve matters, create functions provided by the relevant executor modules that can estimate the required sizes with reasonable accuracy. (We're still not accounting for effects like allocator padding, but this at least gets the first-order effects correct.) I added functions that can estimate the tuple table sizes for nodeSetOp and nodeSubplan; these rely on an estimator for TupleHashTables in general, and that in turn relies on one for simplehash.h hash tables. That feels like kind of a lot of mechanism, but if we take any short-cuts we're violating modularity boundaries. The other places that use TupleHashTables are nodeAgg, which took pains to get its numbers right already, and nodeRecursiveunion. I did not try to improve the situation for nodeRecursiveunion because there's nothing to improve: we are not making an estimate of the hash table size, and it wouldn't help us to do so because we have no non-hashed alternative implementation. On top of that, our estimate of the number of entries to be hashed in that module is so suspect that we'd likely often choose the wrong implementation if we did have two ways to do it. Reported-by: Jeff Janes <[email protected]> Author: Tom Lane <[email protected]> Reviewed-by: David Rowley <[email protected]> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
1 parent b8f1c62 commit 1ea5bdb

File tree

9 files changed

+206
-31
lines changed

9 files changed

+206
-31
lines changed

src/backend/executor/execGrouping.c

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
*/
1515
#include "postgres.h"
1616

17+
#include "access/htup_details.h"
1718
#include "access/parallel.h"
1819
#include "common/hashfn.h"
1920
#include "executor/executor.h"
@@ -302,6 +303,64 @@ ResetTupleHashTable(TupleHashTable hashtable)
302303
MemoryContextReset(hashtable->tuplescxt);
303304
}
304305

306+
/*
307+
* Estimate the amount of space needed for a TupleHashTable with nentries
308+
* entries, if the tuples have average data width tupleWidth and the caller
309+
* requires additionalsize extra space per entry.
310+
*
311+
* Return SIZE_MAX if it'd overflow size_t.
312+
*
313+
* nentries is "double" because this is meant for use by the planner,
314+
* which typically works with double rowcount estimates. So we'd need to
315+
* clamp to integer somewhere and that might as well be here. We do expect
316+
* the value not to be NaN or negative, else the result will be garbage.
317+
*/
318+
Size
319+
EstimateTupleHashTableSpace(double nentries,
320+
Size tupleWidth,
321+
Size additionalsize)
322+
{
323+
Size sh_space;
324+
double tuples_space;
325+
326+
/* First estimate the space needed for the simplehash table */
327+
sh_space = tuplehash_estimate_space(nentries);
328+
329+
/* Give up if that's already too big */
330+
if (sh_space >= SIZE_MAX)
331+
return sh_space;
332+
333+
/*
334+
* Compute space needed for hashed tuples with additional data. nentries
335+
* must be somewhat sane, so it should be safe to compute this product.
336+
*
337+
* We assume that the hashed tuples will be kept in a BumpContext so that
338+
* there is not additional per-tuple overhead.
339+
*
340+
* (Note that this is only accurate if MEMORY_CONTEXT_CHECKING is off,
341+
* else bump.c will add a MemoryChunk header to each tuple. However, it
342+
* seems undesirable for debug builds to make different planning choices
343+
* than production builds, so we assume the production behavior always.)
344+
*/
345+
tuples_space = nentries * (MAXALIGN(SizeofMinimalTupleHeader) +
346+
MAXALIGN(tupleWidth) +
347+
MAXALIGN(additionalsize));
348+
349+
/*
350+
* Check for size_t overflow. This coding is trickier than it may appear,
351+
* because on 64-bit machines SIZE_MAX cannot be represented exactly as a
352+
* double. We must cast it explicitly to suppress compiler warnings about
353+
* an inexact conversion, and we must trust that any double value that
354+
* compares strictly less than "(double) SIZE_MAX" will cast to a
355+
* representable size_t value.
356+
*/
357+
if (sh_space + tuples_space >= (double) SIZE_MAX)
358+
return SIZE_MAX;
359+
360+
/* We don't bother estimating size of the miscellaneous overhead data */
361+
return (Size) (sh_space + tuples_space);
362+
}
363+
305364
/*
306365
* Find or create a hashtable entry for the tuple group containing the
307366
* given tuple. The tuple must be the same type as the hashtable entries.

src/backend/executor/nodeSetOp.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,15 @@ build_hash_table(SetOpState *setopstate)
111111
false);
112112
}
113113

114+
/* Planner support routine to estimate space needed for hash table */
115+
Size
116+
EstimateSetOpHashTableSpace(double nentries, Size tupleWidth)
117+
{
118+
return EstimateTupleHashTableSpace(nentries,
119+
tupleWidth,
120+
sizeof(SetOpStatePerGroupData));
121+
}
122+
114123
/*
115124
* We've completed processing a tuple group. Decide how many copies (if any)
116125
* of its representative row to emit, and store the count into numOutput.

src/backend/executor/nodeSubplan.c

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -525,7 +525,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
525525
node->tab_hash_funcs,
526526
node->tab_collations,
527527
nbuckets,
528-
0,
528+
0, /* no additional data */
529529
node->planstate->state->es_query_cxt,
530530
node->tuplesContext,
531531
innerecontext->ecxt_per_tuple_memory,
@@ -554,7 +554,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
554554
node->tab_hash_funcs,
555555
node->tab_collations,
556556
nbuckets,
557-
0,
557+
0, /* no additional data */
558558
node->planstate->state->es_query_cxt,
559559
node->tuplesContext,
560560
innerecontext->ecxt_per_tuple_memory,
@@ -636,6 +636,55 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
636636
MemoryContextSwitchTo(oldcontext);
637637
}
638638

639+
/* Planner support routine to estimate space needed for hash table(s) */
640+
Size
641+
EstimateSubplanHashTableSpace(double nentries,
642+
Size tupleWidth,
643+
bool unknownEqFalse)
644+
{
645+
Size tab1space,
646+
tab2space;
647+
648+
/* Estimate size of main hashtable */
649+
tab1space = EstimateTupleHashTableSpace(nentries,
650+
tupleWidth,
651+
0 /* no additional data */ );
652+
653+
/* Give up if that's already too big */
654+
if (tab1space >= SIZE_MAX)
655+
return tab1space;
656+
657+
/* Done if we don't need a hashnulls table */
658+
if (unknownEqFalse)
659+
return tab1space;
660+
661+
/*
662+
* Adjust the rowcount estimate in the same way that buildSubPlanHash
663+
* will, except that we don't bother with the special case for a single
664+
* hash column. (We skip that detail because it'd be notationally painful
665+
* for our caller to provide the column count, and this table has
666+
* relatively little impact on the total estimate anyway.)
667+
*/
668+
nentries /= 16;
669+
if (nentries < 1)
670+
nentries = 1;
671+
672+
/*
673+
* It might be sane to also reduce the tupleWidth, but on the other hand
674+
* we are not accounting for the space taken by the tuples' null bitmaps.
675+
* Leave it alone for now.
676+
*/
677+
tab2space = EstimateTupleHashTableSpace(nentries,
678+
tupleWidth,
679+
0 /* no additional data */ );
680+
681+
/* Guard against overflow */
682+
if (tab2space >= SIZE_MAX - tab1space)
683+
return SIZE_MAX;
684+
685+
return tab1space + tab2space;
686+
}
687+
639688
/*
640689
* execTuplesUnequal
641690
* Return true if two tuples are definitely unequal in the indicated

src/backend/optimizer/plan/subselect.c

Lines changed: 23 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include "catalog/pg_operator.h"
2121
#include "catalog/pg_type.h"
2222
#include "executor/executor.h"
23+
#include "executor/nodeSubplan.h"
2324
#include "miscadmin.h"
2425
#include "nodes/makefuncs.h"
2526
#include "nodes/nodeFuncs.h"
@@ -79,8 +80,8 @@ static Node *convert_testexpr(PlannerInfo *root,
7980
List *subst_nodes);
8081
static Node *convert_testexpr_mutator(Node *node,
8182
convert_testexpr_context *context);
82-
static bool subplan_is_hashable(Plan *plan);
83-
static bool subpath_is_hashable(Path *path);
83+
static bool subplan_is_hashable(Plan *plan, bool unknownEqFalse);
84+
static bool subpath_is_hashable(Path *path, bool unknownEqFalse);
8485
static bool testexpr_is_hashable(Node *testexpr, List *param_ids);
8586
static bool test_opexpr_is_hashable(OpExpr *testexpr, List *param_ids);
8687
static bool hash_ok_operator(OpExpr *expr);
@@ -283,7 +284,7 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
283284
best_path = final_rel->cheapest_total_path;
284285

285286
/* Now we can check if it'll fit in hash_mem */
286-
if (subpath_is_hashable(best_path))
287+
if (subpath_is_hashable(best_path, true))
287288
{
288289
SubPlan *hashplan;
289290
AlternativeSubPlan *asplan;
@@ -524,7 +525,7 @@ build_subplan(PlannerInfo *root, Plan *plan, Path *path,
524525
*/
525526
if (subLinkType == ANY_SUBLINK &&
526527
splan->parParam == NIL &&
527-
subplan_is_hashable(plan) &&
528+
subplan_is_hashable(plan, unknownEqFalse) &&
528529
testexpr_is_hashable(splan->testexpr, splan->paramIds))
529530
splan->useHashTable = true;
530531

@@ -711,19 +712,19 @@ convert_testexpr_mutator(Node *node,
711712
* is suitable for hashing. We only look at the subquery itself.
712713
*/
713714
static bool
714-
subplan_is_hashable(Plan *plan)
715+
subplan_is_hashable(Plan *plan, bool unknownEqFalse)
715716
{
716-
double subquery_size;
717+
Size hashtablesize;
717718

718719
/*
719-
* The estimated size of the subquery result must fit in hash_mem. (Note:
720-
* we use heap tuple overhead here even though the tuples will actually be
721-
* stored as MinimalTuples; this provides some fudge factor for hashtable
722-
* overhead.)
720+
* The estimated size of the hashtable holding the subquery result must
721+
* fit in hash_mem. (Note: reject on equality, to ensure that an estimate
722+
* of SIZE_MAX disables hashing regardless of the hash_mem limit.)
723723
*/
724-
subquery_size = plan->plan_rows *
725-
(MAXALIGN(plan->plan_width) + MAXALIGN(SizeofHeapTupleHeader));
726-
if (subquery_size > get_hash_memory_limit())
724+
hashtablesize = EstimateSubplanHashTableSpace(plan->plan_rows,
725+
plan->plan_width,
726+
unknownEqFalse);
727+
if (hashtablesize >= get_hash_memory_limit())
727728
return false;
728729

729730
return true;
@@ -735,19 +736,19 @@ subplan_is_hashable(Plan *plan)
735736
* Identical to subplan_is_hashable, but work from a Path for the subplan.
736737
*/
737738
static bool
738-
subpath_is_hashable(Path *path)
739+
subpath_is_hashable(Path *path, bool unknownEqFalse)
739740
{
740-
double subquery_size;
741+
Size hashtablesize;
741742

742743
/*
743-
* The estimated size of the subquery result must fit in hash_mem. (Note:
744-
* we use heap tuple overhead here even though the tuples will actually be
745-
* stored as MinimalTuples; this provides some fudge factor for hashtable
746-
* overhead.)
744+
* The estimated size of the hashtable holding the subquery result must
745+
* fit in hash_mem. (Note: reject on equality, to ensure that an estimate
746+
* of SIZE_MAX disables hashing regardless of the hash_mem limit.)
747747
*/
748-
subquery_size = path->rows *
749-
(MAXALIGN(path->pathtarget->width) + MAXALIGN(SizeofHeapTupleHeader));
750-
if (subquery_size > get_hash_memory_limit())
748+
hashtablesize = EstimateSubplanHashTableSpace(path->rows,
749+
path->pathtarget->width,
750+
unknownEqFalse);
751+
if (hashtablesize >= get_hash_memory_limit())
751752
return false;
752753

753754
return true;

src/backend/optimizer/util/pathnode.c

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include <math.h>
1818

1919
#include "access/htup_details.h"
20+
#include "executor/nodeSetOp.h"
2021
#include "foreign/fdwapi.h"
2122
#include "miscadmin.h"
2223
#include "nodes/extensible.h"
@@ -3461,7 +3462,7 @@ create_setop_path(PlannerInfo *root,
34613462
}
34623463
else
34633464
{
3464-
Size hashentrysize;
3465+
Size hashtablesize;
34653466

34663467
/*
34673468
* In hashed mode, we must read all the input before we can emit
@@ -3490,11 +3491,12 @@ create_setop_path(PlannerInfo *root,
34903491

34913492
/*
34923493
* Also disable if it doesn't look like the hashtable will fit into
3493-
* hash_mem.
3494+
* hash_mem. (Note: reject on equality, to ensure that an estimate of
3495+
* SIZE_MAX disables hashing regardless of the hash_mem limit.)
34943496
*/
3495-
hashentrysize = MAXALIGN(leftpath->pathtarget->width) +
3496-
MAXALIGN(SizeofMinimalTupleHeader);
3497-
if (hashentrysize * numGroups > get_hash_memory_limit())
3497+
hashtablesize = EstimateSetOpHashTableSpace(numGroups,
3498+
leftpath->pathtarget->width);
3499+
if (hashtablesize >= get_hash_memory_limit())
34983500
pathnode->path.disabled_nodes++;
34993501
}
35003502
pathnode->path.rows = outputRows;

src/include/executor/executor.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,9 @@ extern TupleHashEntry FindTupleHashEntry(TupleHashTable hashtable,
157157
ExprState *eqcomp,
158158
ExprState *hashexpr);
159159
extern void ResetTupleHashTable(TupleHashTable hashtable);
160+
extern Size EstimateTupleHashTableSpace(double nentries,
161+
Size tupleWidth,
162+
Size additionalsize);
160163

161164
#ifndef FRONTEND
162165
/*

src/include/executor/nodeSetOp.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,6 @@ extern SetOpState *ExecInitSetOp(SetOp *node, EState *estate, int eflags);
2020
extern void ExecEndSetOp(SetOpState *node);
2121
extern void ExecReScanSetOp(SetOpState *node);
2222

23+
extern Size EstimateSetOpHashTableSpace(double nentries, Size tupleWidth);
24+
2325
#endif /* NODESETOP_H */

src/include/executor/nodeSubplan.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ extern SubPlanState *ExecInitSubPlan(SubPlan *subplan, PlanState *parent);
2020

2121
extern Datum ExecSubPlan(SubPlanState *node, ExprContext *econtext, bool *isNull);
2222

23+
extern Size EstimateSubplanHashTableSpace(double nentries,
24+
Size tupleWidth,
25+
bool unknownEqFalse);
26+
2327
extern void ExecReScanSetParamPlan(SubPlanState *node, PlanState *parent);
2428

2529
extern void ExecSetParamPlan(SubPlanState *node, ExprContext *econtext);

0 commit comments

Comments
 (0)