Skip to content

Allow some array functions to operate in-place using a peep-hole optimization #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

nielsdos
Copy link
Owner

@nielsdos nielsdos commented Apr 10, 2023

This patch optimizes the $x = array_function($x, ...) pattern and the
$x = array_function(temporary, ...) pattern for these functions:
array_merge, array_unique, array_replace, and array_intersect.
With these limitations:

  • array_{unique,intersect} only do the temporary optimization because the
    comparison may throw.
  • array_{merge,replace} only optimizes CVs for non-recursive case because the
    recursive version may throw.
  • array_merge optimization works only if the array is packed and is
    without holes.

It works by checking if the array function is immediately followed by an
assignment which overrides the input. In that case we can do the
operation in-place instead of copying the array. Note that this is
limited to CV's at the moment, and can't handle more complex scenarios
like array or object assignments.
For the temporary case it suffices to check if the refcount of the input
array is 1 and the array is non-persistent and non-immutable.

The current approach is a bit ugly though: it looks at the VM
instructions from within a function to check if the optimization is
possible, which is a bit odd.
I considered extending opcache as an alternative, but I believe this would
require adding a whole bunch of machinery for only a few users.

Looking at the assembly of prepare_in_place_array_modify_if_possible()
it looks pretty light-weight, about 95 bytes / 29 instructions on my
x86-64 Linux laptop.

Safety

There are some array functions which take some sort of copy of the input
array into a temporary C array for sorting.
(e.g. array_unique, array_diff, and array_intersect do this).
Since we no longer take a copy in all cases, we must check if it's
possible that a value is accessed that was already destroyed.

For array_unique: cmpdata will never be removed so that will never reach
refcount 0. And when something is removed, it is the previous value of
cmpdata, not the one user later. So this seems okay.

For array_intersect: a previous pointer (ptr[0] - 1) is accessed.
But this can't be a destroyed value because the pointer is first moved forward.

Results

Using this benchmark script
https://gist.github.com/nielsdos/ae5a2dddc53c61749ae31c908aa78e98 I get:

=== array_merge $a = array_merge($a, ...) ===
before 4.3821 sec
after 0.0022 sec
=== array_merge temporary ===
before 0.1265 sec
after 0.0479 sec
=== array_unique temporary ===
before 0.9297 sec
after 0.8498 sec
=== array_replace $a = array_replace($a, ...) ===
before 0.0810 sec
after 0.0083 sec
=== array_replace temporary ===
before 0.1261 sec
after 0.0534 sec
=== array_intersect temporary (no significant improvement because dominated by sorting) ===
before 30.499 sec
after 30.356 sec

@nielsdos nielsdos force-pushed the in-place-array-modification-squashed branch 3 times, most recently from 7759f7f to 5a3f9e5 Compare April 10, 2023 19:59
nielsdos referenced this pull request Apr 10, 2023
@Girgias
Copy link

Girgias commented Apr 11, 2023

Technically, any comparison involving objects can throw, see: https://3v4l.org/64TZU

@nielsdos
Copy link
Owner Author

nielsdos commented Apr 11, 2023

Thanks @Girgias for taking a look.
Tim made me aware of the error behaviour just recently. That is why there is a check that checks whether there is a user error handler to disallow the "input=output" optimization for array_unique.

In fact there are two optimizations

  • RC1 optimization: if there is only one reference to an array, it's because it was passed as an argument and therefore it can be replaced. Even if there is an exception, a partial or in-place modification cannot be observed because the assignment never happened and the return value is destroyed.
  • input=output optimization: if the input gets overwritten by the output, it is safe to do so if there are no exception handlers or other user code that can run in between the function call and the assignment.

The four functions: array_merge, array_unique, array_replace, and array_intersect all perform the RC1 optimization with this patch.

array_unique only performs the input=output optimization if the sorting algorithm is SORT_REGULAR or SORT_NUMERIC and no user error handler is installed.
array_merge and array_replace can do the input=output optimization safely in the non-recursive case.

Do you think there is something I'm overlooking?

@Girgias
Copy link

Girgias commented Apr 11, 2023

Thanks @Girgias for taking a look. Tim made me aware of the error behaviour just recently. That is why there is a check that checks whether there is a user error handler to disallow the "input=output" optimization for array_unique.

In fact there are two optimizations

* RC1 optimization: if there is only one reference to an array, it's because it was passed as an argument and therefore it can be replaced. Even if there is an exception, a partial or in-place modification cannot be observed because the assignment never happened and the return value is destroyed.

* input=output optimization: if the input gets overwritten by the output, it is safe to do so if there are no exception handlers or other user code that can run in between the function call and the assignment.

The four functions: array_merge, array_unique, array_replace, and array_intersect all perform the RC1 optimization with this patch.

array_unique only performs the input=output optimization if the sorting algorithm is SORT_REGULAR or SORT_NUMERIC and no user error handler is installed. array_merge and array_replace can do the input=output optimization safely in the non-recursive case.

Do you think there is something I'm overlooking?

So any internal object can define its own comparison handler, and thus can throw. If I modify the previous example to:

$o = gmp_init(10);
$fruits = array($o, []);
sort($fruits);

The call to sort() will throw a TypeError:

Fatal error: Uncaught TypeError: Number must be of type GMP|string|int, array given in /tmp/preview:5

So I'm not sure you can even safely assume that comparisons are not going to throw an exception even for SORT_NUMERIC and SORT_REGULAR, except if no objects are present, but that means traversing the array :/ (or us having typed arrays)

EDIT: arguably GMP's behaviour is slightly strange and needs fixing but we don't currently enforce extensions to not throw exceptions or emit notices (when we probably should) in the compare hook (nor the do_operation one)

@nielsdos
Copy link
Owner Author

I see. So that's indeed a problem for SORT_REGULAR. And it looks like there's a related problem for SORT_NUMERIC: the object->cast() method can throw...
That means I'll have to ban the optimization for array_unique, except for the RC1 case. Bummer, but it is what it is...

As always thanks for your insight.

@nielsdos nielsdos force-pushed the in-place-array-modification-squashed branch 3 times, most recently from abb4376 to a0b7278 Compare April 11, 2023 19:20
This allows array_merge, array_intersect, array_replace, array_unique
and usort to avoid taking a copy and do the transformation in-place.

** Safety **

There are some array functions which take a copy of the input
array into a temporary C array for sorting purposes.
(e.g. array_unique, array_diff, and array_intersect do this).
Since we no longer take a copy in all cases, we must check if
it's possible that a value is accessed that was already destroyed.

For array_unique: cmpdata will never be removed so that will never reach
refcount 0. And when something is removed, it is the previous value of
cmpdata, not the one user later. So this seems okay.

For array_intersect: a previous pointer (ptr[0] - 1) is accessed.
But this can't be a destroyed value because the pointer is first moved forward.

For array_diff: it's possible a previous pointer is accessed after
destruction. So we can't optimise this case easily.
@nielsdos nielsdos force-pushed the in-place-array-modification-squashed branch from a0b7278 to bb5deb3 Compare April 23, 2023 20:52
@nielsdos nielsdos closed this Apr 23, 2023
nielsdos pushed a commit that referenced this pull request Oct 13, 2024
even without sanitizers, it is reproducible but with the following

```
<?php
$g = gmp_init(256);
var_dump(gmp_pow($g, PHP_INT_MAX));
```

we get this

```
AddressSanitizer:DEADLYSIGNAL
=================================================================
==286922==ERROR: AddressSanitizer: FPE on unknown address 0x03e8000460ca (pc 0x7faf6c69de5c bp 0x400000000000004 sp 0x7ffe9843c740 T0)
    #0 0x7faf6c69de5c in __pthread_kill_implementation nptl/pthread_kill.c:44
    #1 0x7faf6c649c81 in __GI_raise ../sysdeps/posix/raise.c:26
    #2 0x7faf6db9386c in __gmp_exception (/lib/x86_64-linux-gnu/libgmp.so.10+0xd86c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38)
    #3 0x7faf6db938d3 in __gmp_overflow_in_mpz (/lib/x86_64-linux-gnu/libgmp.so.10+0xd8d3) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38)
    #4 0x7faf6dbac95c in __gmpz_realloc (/lib/x86_64-linux-gnu/libgmp.so.10+0x2695c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38)
    #5 0x7faf6dba9038 in __gmpz_n_pow_ui (/lib/x86_64-linux-gnu/libgmp.so.10+0x23038) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38)
    #6 0x5565ae1ccd9f in zif_gmp_pow /home/dcarlier/Contribs/php-src/ext/gmp/gmp.c:1286
    #7 0x5565aee96ea9 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:1312
    #8 0x5565af144320 in execute_ex /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:56075
    #9 0x5565af160f07 in zend_execute /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:60439
    #10 0x5565aed6fafe in zend_execute_scripts /home/dcarlier/Contribs/php-src/Zend/zend.c:1842
    #11 0x5565aeae70a8 in php_execute_script /home/dcarlier/Contribs/php-src/main/main.c:2578
    #12 0x5565af532f4e in do_cli /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:964
    #13 0x5565af535877 in main /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:1334
    #14 0x7faf6c633d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #15 0x7faf6c633e24 in __libc_start_main_impl ../csu/libc-start.c:360
    #16 0x5565adc04040 in _start (/home/dcarlier/Contribs/php-src/sapi/cli/php+0x2604040) (BuildId: 949049955bdf8b7197390b1978a1dfc3ef6fdf38)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: FPE nptl/pthread_kill.c:44 in __pthread_kill_implementation
==286922==ABORTING
```
nielsdos pushed a commit that referenced this pull request Oct 26, 2024
even without sanitizers, it is reproducible but with the following

```
<?php
$g = gmp_init(256);
var_dump(gmp_pow($g, PHP_INT_MAX));
```

we get this

```
AddressSanitizer:DEADLYSIGNAL
=================================================================
==286922==ERROR: AddressSanitizer: FPE on unknown address 0x03e8000460ca (pc 0x7faf6c69de5c bp 0x400000000000004 sp 0x7ffe9843c740 T0)
    #0 0x7faf6c69de5c in __pthread_kill_implementation nptl/pthread_kill.c:44
    #1 0x7faf6c649c81 in __GI_raise ../sysdeps/posix/raise.c:26
    #2 0x7faf6db9386c in __gmp_exception (/lib/x86_64-linux-gnu/libgmp.so.10+0xd86c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38)
    #3 0x7faf6db938d3 in __gmp_overflow_in_mpz (/lib/x86_64-linux-gnu/libgmp.so.10+0xd8d3) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38)
    #4 0x7faf6dbac95c in __gmpz_realloc (/lib/x86_64-linux-gnu/libgmp.so.10+0x2695c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38)
    #5 0x7faf6dba9038 in __gmpz_n_pow_ui (/lib/x86_64-linux-gnu/libgmp.so.10+0x23038) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38)
    #6 0x5565ae1ccd9f in zif_gmp_pow /home/dcarlier/Contribs/php-src/ext/gmp/gmp.c:1286
    #7 0x5565aee96ea9 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:1312
    #8 0x5565af144320 in execute_ex /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:56075
    #9 0x5565af160f07 in zend_execute /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:60439
    #10 0x5565aed6fafe in zend_execute_scripts /home/dcarlier/Contribs/php-src/Zend/zend.c:1842
    #11 0x5565aeae70a8 in php_execute_script /home/dcarlier/Contribs/php-src/main/main.c:2578
    #12 0x5565af532f4e in do_cli /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:964
    #13 0x5565af535877 in main /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:1334
    #14 0x7faf6c633d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #15 0x7faf6c633e24 in __libc_start_main_impl ../csu/libc-start.c:360
    #16 0x5565adc04040 in _start (/home/dcarlier/Contribs/php-src/sapi/cli/php+0x2604040) (BuildId: 949049955bdf8b7197390b1978a1dfc3ef6fdf38)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: FPE nptl/pthread_kill.c:44 in __pthread_kill_implementation
==286922==ABORTING
```

close phpGH-16384
nielsdos pushed a commit that referenced this pull request Mar 29, 2025
```
ext/gd/libgd/gd.c:2275:14: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
    #0 0x5d6a2103e1db in php_gd_gdImageCopy /home/dcarlier/Contribs/php-src/ext/gd/libgd/gd.c:2275
    #1 0x5d6a210a2b63 in gdImageCrop /home/dcarlier/Contribs/php-src/ext/gd/libgd/gd_crop.c:57
    #2 0x5d6a21018ca4 in zif_imagecrop /home/dcarlier/Contribs/php-src/ext/gd/gd.c:3575
    #3 0x5d6a21e46e7a in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:1337
    #4 0x5d6a221188da in execute_ex /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:57246
    #5 0x5d6a221366bd in zend_execute /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:61634
    #6 0x5d6a21d107a6 in zend_execute_scripts /home/dcarlier/Contribs/php-src/Zend/zend.c:1895
    #7 0x5d6a21a63409 in php_execute_script /home/dcarlier/Contribs/php-src/main/main.c:2529
    #8 0x5d6a22516d5e in do_cli /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:966
    #9 0x5d6a2251981d in main /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:1341
    #10 0x7f10d002a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #11 0x7f10d002a47a in __libc_start_main_impl ../csu/libc-start.c:360
    #12 0x5d6a20a06da4 in _start (/home/dcarlier/Contribs/php-src/sapi/cli/php+0x2806da4) (BuildId: d9a79c7e0e4872311439d7313cb3a81fe04190a2)
```

close phpGH-18006
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants