Skip to content

Commit 45e64d9

Browse files
ashutosh-bapatCommitfest Bot
authored andcommitted
Support shrinking shared buffers
Buffer eviction =============== When shrinking the shared buffers pool, each buffer in the area being shrunk needs to be flushed if it's dirty so as not to loose the changes to that buffer after shrinking. Also, each such buffer needs to be removed from the buffer mapping table so that backends do not access it after shrinking. Buffer eviction requires a separate barrier phase for two reasons: 1. No other backend should map a new page to any of buffers being evicted when eviction is in progress. So they wait while eviction is in progress. 2. Since a pinned buffer has the pin recorded in the backend local memory as well as the buffer descriptor (which is in shared memory), eviction should not coincide with remapping the shared memory of a backend. Otherwise we might loose consistency of local and shared pinning records. Hence it needs to be carried out in ProcessBarrierShmemResize() and not in AnonymousShmemResize() as indicated by now removed comment. If a buffer being evicted is pinned, we raise a FATAL error but this should improve. There are multiple options 1. to wait for the pinned buffer to get unpinned, 2. the backend is killed or it itself cancels the query or 3. rollback the operation. Note that option 1 and 2 would require the pinning related local and shared records to be accessed. But we need infrastructure to do either of this right now. Removing the evicted buffers from buffer ring ============================================= If the buffer pool has been shrunk, the buffers in the buffer ring may not be valid anymore. Modify GetBufferFromRing to check if the buffer is still valid before using it. This makes GetBufferFromRing() a bit more expensive because of additional boolean condition and masks any bug that introduces an invalid buffer into the ring. The alternative fix is more complex as explained below. The strategy object is created in CurrentMemoryContext and is not available in any global structure thus accessible when processing buffer resizing barriers. We may modify GetAccessStrategy() to register strategy in a global linked list and then arrange to deregister it once it's no more in use. Looking at the places which use GetAccessStrategy(), fixing all those may be some work. Author: Ashutosh Bapat Reviewed-by: Tomas Vondra
1 parent 42113e3 commit 45e64d9

File tree

6 files changed

+139
-17
lines changed

6 files changed

+139
-17
lines changed

src/backend/port/sysv_shmem.c

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -993,14 +993,6 @@ AnonymousShmemResize(void)
993993
*/
994994
pending_pm_shmem_resize = false;
995995

996-
/*
997-
* XXX: Currently only increasing of shared_buffers is supported. For
998-
* decreasing something similar has to be done, but buffer blocks with
999-
* data have to be drained first.
1000-
*/
1001-
if(NBuffersOld > NBuffers)
1002-
return false;
1003-
1004996
#ifndef MAP_HUGETLB
1005997
/* PrepareHugePages should have dealt with this case */
1006998
Assert(huge_pages != HUGE_PAGES_ON && !huge_pages_on);
@@ -1099,11 +1091,14 @@ AnonymousShmemResize(void)
10991091
* all the pointers are still valid, and we only need to update
11001092
* structures size in the ShmemIndex once -- any other backend
11011093
* will pick up this shared structure from the index.
1102-
*
1103-
* XXX: This is the right place for buffer eviction as well.
11041094
*/
11051095
BufferManagerShmemInit(NBuffersOld);
11061096

1097+
/*
1098+
* Wipe out the evictor PID so that it can be used for the next
1099+
* buffer resizing operation.
1100+
*/
1101+
ShmemCtrl->evictor_pid = 0;
11071102
/* If all fine, broadcast the new value */
11081103
pg_atomic_write_u32(&ShmemCtrl->NSharedBuffers, NBuffers);
11091104
}
@@ -1156,11 +1151,31 @@ ProcessBarrierShmemResize(Barrier *barrier)
11561151
* XXX: If we need to be able to abort resizing, this has to be done later,
11571152
* after the SHMEM_RESIZE_DONE.
11581153
*/
1159-
if (BarrierArriveAndWait(barrier, WAIT_EVENT_SHMEM_RESIZE_START))
1154+
1155+
/*
1156+
* Evict extra buffers when shrinking shared buffers. We need to do this
1157+
* while the memory for extra buffers is still mapped i.e. before remapping
1158+
* the shared memory segments to a smaller memory area.
1159+
*/
1160+
if (NBuffersOld > NBuffersPending)
11601161
{
1161-
Assert(IsUnderPostmaster);
1162-
SendPostmasterSignal(PMSIGNAL_SHMEM_RESIZE);
1162+
BarrierArriveAndWait(barrier, WAIT_EVENT_SHMEM_RESIZE_START);
1163+
1164+
/*
1165+
* TODO: If the buffer eviction fails for any reason, we should
1166+
* gracefully rollback the shared buffer resizing and try again. But the
1167+
* infrastructure to do so is not available right now. Hence just raise
1168+
* a FATAL so that the system restarts.
1169+
*/
1170+
if (!EvictExtraBuffers(NBuffersPending, NBuffersOld))
1171+
elog(FATAL, "buffer eviction failed");
1172+
1173+
if (BarrierArriveAndWait(barrier, WAIT_EVENT_SHMEM_RESIZE_EVICT))
1174+
SendPostmasterSignal(PMSIGNAL_SHMEM_RESIZE);
11631175
}
1176+
else
1177+
if (BarrierArriveAndWait(barrier, WAIT_EVENT_SHMEM_RESIZE_START))
1178+
SendPostmasterSignal(PMSIGNAL_SHMEM_RESIZE);
11641179

11651180
AnonymousShmemResize();
11661181

@@ -1684,5 +1699,6 @@ ShmemControlInit(void)
16841699

16851700
/* shmem_resizable should be initialized by now */
16861701
ShmemCtrl->Resizable = shmem_resizable;
1702+
ShmemCtrl->evictor_pid = 0;
16871703
}
16881704
}

src/backend/storage/buffer/bufmgr.c

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
#include "storage/fd.h"
5858
#include "storage/ipc.h"
5959
#include "storage/lmgr.h"
60+
#include "storage/pg_shmem.h"
6061
#include "storage/proc.h"
6162
#include "storage/read_stream.h"
6263
#include "storage/smgr.h"
@@ -7422,3 +7423,95 @@ const PgAioHandleCallbacks aio_local_buffer_readv_cb = {
74227423
.complete_local = local_buffer_readv_complete,
74237424
.report = buffer_readv_report,
74247425
};
7426+
7427+
/*
7428+
* When shrinking shared buffers pool, evict the buffers which will not be part
7429+
* of the shrunk buffer pool.
7430+
*/
7431+
bool
7432+
EvictExtraBuffers(int newBufSize, int oldBufSize)
7433+
{
7434+
bool result = true;
7435+
7436+
/*
7437+
* If the buffer being evicated is locked, this function will need to wait.
7438+
* This function should not be called from a Postmaster since it can not wait on a lock.
7439+
*/
7440+
Assert(IsUnderPostmaster);
7441+
7442+
/*
7443+
* Let only one backend perform eviction. We could split the work across all
7444+
* the backends but that doesn't seem necessary.
7445+
*
7446+
* The first backend to acquire ShmemResizeLock, sets its own PID as the
7447+
* evictor PID for other backends to know that the eviction is in progress or
7448+
* has already been performed. The evictor backend releases the lock when it
7449+
* finishes eviction. While the eviction is in progress, backends other than
7450+
* evictor backend won't be able to take the lock. They won't perform
7451+
* eviction. A backend may acquire the lock after eviction has completed, but
7452+
* it will not perform eviction since the evictor PID is already set. Evictor
7453+
* PID is reset only when the buffer resizing finishes. Thus only one backend
7454+
* will perform eviction in a given instance of shared buffers resizing.
7455+
*
7456+
* Any backend which acquires this lock will release it before the eviction
7457+
* phase finishes, hence the same lock can be reused for the next phase of
7458+
* resizing buffers.
7459+
*/
7460+
if (LWLockConditionalAcquire(ShmemResizeLock, LW_EXCLUSIVE))
7461+
{
7462+
if (ShmemCtrl->evictor_pid == 0)
7463+
{
7464+
ShmemCtrl->evictor_pid = MyProcPid;
7465+
7466+
/*
7467+
* TODO: Before evicting any buffer, we should check whether any of the
7468+
* buffers are pinned. If we find that a buffer is pinned after evicting
7469+
* most of them, that will impact performance since all those evicted
7470+
* buffers might need to be read again.
7471+
*/
7472+
for (Buffer buf = newBufSize + 1; buf <= oldBufSize; buf++)
7473+
{
7474+
BufferDesc *desc = GetBufferDescriptor(buf - 1);
7475+
uint32 buf_state;
7476+
bool buffer_flushed;
7477+
7478+
buf_state = pg_atomic_read_u32(&desc->state);
7479+
7480+
/*
7481+
* Nobody is expected to touch the buffers while resizing is
7482+
* going one hence unlocked precheck should be safe and saves
7483+
* some cycles.
7484+
*/
7485+
if (!(buf_state & BM_VALID))
7486+
continue;
7487+
7488+
/*
7489+
* XXX: Looks like CurrentResourceOwner can be NULL here, find
7490+
* another one in that case?
7491+
* */
7492+
if (CurrentResourceOwner)
7493+
ResourceOwnerEnlarge(CurrentResourceOwner);
7494+
7495+
ReservePrivateRefCountEntry();
7496+
7497+
LockBufHdr(desc);
7498+
7499+
/*
7500+
* Now that we have locked buffer descriptor, make sure that the
7501+
* buffer without valid data has been skipped above.
7502+
*/
7503+
Assert(buf_state & BM_VALID);
7504+
7505+
if (!EvictUnpinnedBufferInternal(desc, &buffer_flushed))
7506+
{
7507+
elog(WARNING, "could not remove buffer %u, it is pinned", buf);
7508+
result = false;
7509+
break;
7510+
}
7511+
}
7512+
}
7513+
LWLockRelease(ShmemResizeLock);
7514+
}
7515+
7516+
return result;
7517+
}

src/backend/storage/buffer/freelist.c

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -630,12 +630,22 @@ GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
630630
strategy->current = 0;
631631

632632
/*
633-
* If the slot hasn't been filled yet, tell the caller to allocate a new
634-
* buffer with the normal allocation strategy. He will then fill this
635-
* slot by calling AddBufferToRing with the new buffer.
633+
* If the slot hasn't been filled yet or the buffer in the slot has been
634+
* invalidated when buffer pool was shrunk, tell the caller to allocate a new
635+
* buffer with the normal allocation strategy. He will then fill this slot
636+
* by calling AddBufferToRing with the new buffer.
637+
*
638+
* TODO: Ideally we would want to check for bufnum > NBuffers only once
639+
* after every time the buffer pool is shrunk so as to catch any runtime
640+
* bugs that introduce invalid buffers in the ring. But that is complicated.
641+
* The BufferAccessStrategy objects are not accessible outside the
642+
* ScanState. Hence we can not purge the buffers while evicting the buffers.
643+
* After the resizing is finished, it's not possible to notice when we touch
644+
* the first of those objects and the last of objects. See if this can
645+
* fixed.
636646
*/
637647
bufnum = strategy->buffers[strategy->current];
638-
if (bufnum == InvalidBuffer)
648+
if (bufnum == InvalidBuffer || bufnum > NBuffers)
639649
return NULL;
640650

641651
/*

src/backend/utils/activity/wait_event_names.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ REPLICATION_SLOT_DROP "Waiting for a replication slot to become inactive so it c
156156
RESTORE_COMMAND "Waiting for <xref linkend="guc-restore-command"/> to complete."
157157
SAFE_SNAPSHOT "Waiting to obtain a valid snapshot for a <literal>READ ONLY DEFERRABLE</literal> transaction."
158158
SHMEM_RESIZE_START "Waiting for other backends to start resizing shared memory."
159+
SHMEM_RESIZE_EVICT "Waiting for other backends to finish buffer evication phase."
159160
SHMEM_RESIZE_DONE "Waiting for other backends to finish resizing shared memory."
160161
SYNC_REP "Waiting for confirmation from a remote server during synchronous replication."
161162
WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."

src/include/storage/bufmgr.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,7 @@ extern void EvictRelUnpinnedBuffers(Relation rel,
315315
int32 *buffers_evicted,
316316
int32 *buffers_flushed,
317317
int32 *buffers_skipped);
318+
extern bool EvictExtraBuffers(int fromBuf, int toBuf);
318319

319320
/* in buf_init.c */
320321
extern void BufferManagerShmemInit(int);

src/include/storage/pg_shmem.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ extern PGDLLIMPORT AnonymousMapping Mappings[ANON_MAPPINGS];
7777
typedef struct
7878
{
7979
pg_atomic_uint32 NSharedBuffers;
80+
pid_t evictor_pid;
8081
Barrier Barrier;
8182
pg_atomic_uint64 Generation;
8283
bool Resizable;

0 commit comments

Comments
 (0)