You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Postgres 17 commit e0b1ee1 added two closely related nbtree
optimizations: the "prechecked" and "firstpage" optimizations. Both
optimizations avoided needlessly evaluating keys that are guaranteed to
be satisfied by applying page-level context. These optimizations were
adapted to work with the nbtree ScalarArrayOp execution patch a few
months later, which became commit 5bf748b.
The "prechecked" design had a number of notable weak points. It didn't
account for the fact that an = array scan key's sk_argument field might
need to advance at the point of the page precheck (it didn't check the
precheck tuple against the key's array, only the key's sk_argument,
which needlessly made it ineffective in corner cases involving stepping
to a page having advanced the scan's arrays using a truncated high key).
It was also an "all or nothing" optimization: either it was completely
effective (skipping all required-in-scan-direction keys against all
attributes) for the whole page, or it didn't work at all. This also
implied that it couldn't be used on pages where the scan had to
terminate before reaching the end of the page due to an unsatisfied
low-order key setting continuescan=false.
Replace both optimizations with a new optimization without any of these
weak points. This works by giving affected _bt_readpage calls a scankey
offset that its _bt_checkkeys calls start at (an offset to the first key
that might not be satisfied by every non-pivot tuple from the page).
The new optimization is activated at the same point as the previous
"prechecked" optimization (at the start of a _bt_readpage of any page
after the scan's first).
The old "prechecked" optimization worked off of the highest non-pivot
tuple on the page (or the lowest, when scanning backwards), but the new
"startikey" optimization always works off of a pair of non-pivot tuples
(the lowest and the highest, taken together). This approach allows the
"startikey" optimization to bypass = array key comparisons whenever
they're satisfied by _some_ element (not necessarily the current one).
This is useful for SAOP array keys (it fixes the issue with truncated
high keys), and is needed to get the most out of range skip array keys
(we expect to be able to bypass range skip array = keys when a range of
values on the page all satisfy the key, even when there are multiple
values, provided they all "satisfy some range skip array element").
Although this is independently useful work, the main motivation is to
fix regressions in index scans that are nominally eligible to use skip
scan, but can never actually benefit from skipping. These are cases
where a leading prefix column contains many distinct values, especially
when the number of values approaches the total number of index tuples,
where skipping can never be profitable. The CPU costs of skip array
maintenance is by far the main cost that must be kept under control.
Skip scan's approach of adding skip arrays during preprocessing and then
fixing (or significantly ameliorating) the resulting regressions seen in
unsympathetic cases is enabled by the optimization added by this commit
(and by the "look ahead" optimization introduced by commit 5bf748b).
This allows the planner to avoid generating distinct, competing index
paths (one path for skip scan, another for an equivalent traditional
full index scan). The overall effect is to make scan runtime close to
optimal, even when the planner works off an incorrect cardinality
estimate. Scans will also perform well given a skipped column with data
skew: individual groups of pages with many distinct values in respect of
a skipped column can be read about as efficiently as before, without
having to give up on skipping over other provably-irrelevant leaf pages.
Author: Peter Geoghegan <[email protected]>
Reviewed-By: Heikki Linnakangas <[email protected]>
Reviewed-By: Masahiro Ikeda <[email protected]>
Reviewed-By: Matthias van de Meent <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wz=Y93jf5WjoOsN=xvqpMjRy-bxCE037bVFi-EasrpeUJA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WznWDK45JfNPNvDxh6RQy-TaCwULaM5u5ALMXbjLBMcugQ@mail.gmail.com
0 commit comments