Skip to content

Reduce HT_MAX_SIZE to account for the max load factor of 0.5 #10242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 13, 2023

Conversation

arnaud-lb
Copy link
Member

zend_hash allocates a hash table twice as big as nTableSize: HT_HASH_SIZE(HT_SIZE_TO_MASK(nTableSize)) == nTableSize*2, so HT_MAX_SIZE must be half the max table size or less.

This fixes #10240:

In HT_HASH_RESET, HT_HASH_SIZE((ht)->nTableMask) evaluates to 0, leading to the assertion failure.

0 is indeed an invalid value for nTableMask, given how the hash table offsets are computed:

offset = (hash | nTableMask)        // e.g. in zend_hash_str_find_bucket
idx = ht->arData[ (int32_t)offset ] // HT_HASH_EX

This relies on nTableMask having at least the highest bit set, so that interpreting (hash | nTableMask) as signed gives an offset in the range ]-(~(nTableMask)+1)...0]. ht->arData points to the end of the hash table. hash is always > 0, so the range is actually ]-(~(nTableMask)+1)...-1].

HT_SIZE_TO_MASK multiples nSize by 2 to ensure a load factor of 0.5.

Given that, I think that HT_MAX_SIZE should be (2**31 / 2) : 0x40000000.

@arnaud-lb arnaud-lb changed the base branch from master to PHP-8.1 January 6, 2023 15:58
zend_hash allocates a hash table twice as big as nTableSize
(HT_HASH_SIZE(HT_SIZE_TO_MASK(nTableSize)) == nTableSize*2), so HT_MAX_SIZE
must be half the max table size or less.
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible to me.

Maybe just expand the comment on why the MAX_SIZE was chosen in the header file?

Comment on lines -431 to +438
(((size_t)(uint32_t)-(int32_t)(nTableMask)) * sizeof(uint32_t))
(((size_t)-(uint32_t)(nTableMask)) * sizeof(uint32_t))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(int32_t)(nTableMask) is implementation defined when nTableMask has its max value (uint32_t)INT32_MAX+1.

I think that the (int32_t) cast is not necessary here, and that -(uint32_t)(nTableMask) gives the same result without being implementation defined.

#if SIZEOF_SIZE_T == 4
# define HT_MAX_SIZE 0x04000000 /* small enough to avoid overflow checks */
# define HT_MAX_SIZE 0x02000000
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is SSIZE_MAX/(sizeof(Bucket)+2*sizeof(uint32_t)) (0x3ffffff) rounded down to the closest power of two

@arnaud-lb arnaud-lb merged commit 0f7625c into php:PHP-8.1 Jan 13, 2023
arnaud-lb added a commit that referenced this pull request Jan 13, 2023
* PHP-8.1:
  Reduce HT_MAX_SIZE to account for the max load factor of 0.5 (#10242)
  GC fiber unfinished executions (#9810)
arnaud-lb added a commit that referenced this pull request Jan 13, 2023
* PHP-8.2:
  [ci skip] NEWS
  Reduce HT_MAX_SIZE to account for the max load factor of 0.5 (#10242)
  GC fiber unfinished executions (#9810)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion failure when adding more than 2**30 elements to an unpacked array
2 participants