You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimize shared LWLock acquisition for high-core-count systems
This patch introduces optimizations to reduce lock acquisition overhead in
LWLock by merging the read and update operations for the LW_SHARED lock's
state. This eliminates the need for separate atomic instructions, which is
critical for improving performance on high-core-count systems.
Key changes:
- Extended LW_SHARED_MASK by 1 bit and shifted LW_VAL_EXCLUSIVE by 1 bit to
ensure compatibility with the upper bound of MAX_BACKENDS * 2.
- Added a `willwait` parameter to `LWLockAttemptLock` to disable the
optimization when the caller is unwilling to wait, avoiding conflicts
between the reference count and the LW_VAL_EXCLUSIVE flag.
- Updated `LWLockReleaseInternal` to use `pg_atomic_fetch_and_u32` for
clearing lock state flags atomically.
- Adjusted related functions (`LWLockAcquire`, `LWLockConditionalAcquire`,
`LWLockAcquireOrWait`) to pass the `willwait` parameter appropriately.
Key optimization ideas:
It is only activated when willwait=true, ensuring that the reference
count does not grow unchecked and overflow into the LW_VAL_EXCLUSIVE bit.
Three scenarios can occur when acquiring a shared lock:
1) Lock is free: atomically increment reference count and acquire
2) Lock held in shared mode: atomically increment reference count and acquire
3) Lock held exclusively: atomically increment reference count but fail to acquire
Scenarios 1 and 2 work as expected - we successfully increment the count
and acquire the lock.
Scenario 3 is counterintuitive: we increment the reference count even though
we cannot acquire the lock due to the exclusive holder. This creates a
temporarily invalid reference count, but it's acceptable because:
- The LW_VAL_EXCLUSIVE flag takes precedence in determining lock state
- Each process retries at most twice before blocking on a semaphore
- This bounds the "overcounted" references to MAX_BACKENDS * 2
- The bound fits within LW_SHARED_MASK capacity
- The lock->state including "overcounted" references is reset when the exclusive
lock is released.
These changes improve scalability and reduce contention in workloads with
frequent LWLock operations on servers with many cores.
0 commit comments