-
Notifications
You must be signed in to change notification settings - Fork 0
Allow some array functions to operate in-place using a peep-hole optimization #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7759f7f
to
5a3f9e5
Compare
Technically, any comparison involving objects can throw, see: https://3v4l.org/64TZU |
Thanks @Girgias for taking a look. In fact there are two optimizations
The four functions: array_merge, array_unique, array_replace, and array_intersect all perform the RC1 optimization with this patch. array_unique only performs the input=output optimization if the sorting algorithm is SORT_REGULAR or SORT_NUMERIC and no user error handler is installed. Do you think there is something I'm overlooking? |
So any internal object can define its own comparison handler, and thus can throw. If I modify the previous example to: $o = gmp_init(10);
$fruits = array($o, []);
sort($fruits); The call to
So I'm not sure you can even safely assume that comparisons are not going to throw an exception even for SORT_NUMERIC and SORT_REGULAR, except if no objects are present, but that means traversing the array :/ (or us having typed arrays) EDIT: arguably GMP's behaviour is slightly strange and needs fixing but we don't currently enforce extensions to not throw exceptions or emit notices (when we probably should) in the compare hook (nor the do_operation one) |
I see. So that's indeed a problem for SORT_REGULAR. And it looks like there's a related problem for SORT_NUMERIC: the object->cast() method can throw... As always thanks for your insight. |
abb4376
to
a0b7278
Compare
This allows array_merge, array_intersect, array_replace, array_unique and usort to avoid taking a copy and do the transformation in-place. ** Safety ** There are some array functions which take a copy of the input array into a temporary C array for sorting purposes. (e.g. array_unique, array_diff, and array_intersect do this). Since we no longer take a copy in all cases, we must check if it's possible that a value is accessed that was already destroyed. For array_unique: cmpdata will never be removed so that will never reach refcount 0. And when something is removed, it is the previous value of cmpdata, not the one user later. So this seems okay. For array_intersect: a previous pointer (ptr[0] - 1) is accessed. But this can't be a destroyed value because the pointer is first moved forward. For array_diff: it's possible a previous pointer is accessed after destruction. So we can't optimise this case easily.
a0b7278
to
bb5deb3
Compare
even without sanitizers, it is reproducible but with the following ``` <?php $g = gmp_init(256); var_dump(gmp_pow($g, PHP_INT_MAX)); ``` we get this ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==286922==ERROR: AddressSanitizer: FPE on unknown address 0x03e8000460ca (pc 0x7faf6c69de5c bp 0x400000000000004 sp 0x7ffe9843c740 T0) #0 0x7faf6c69de5c in __pthread_kill_implementation nptl/pthread_kill.c:44 #1 0x7faf6c649c81 in __GI_raise ../sysdeps/posix/raise.c:26 #2 0x7faf6db9386c in __gmp_exception (/lib/x86_64-linux-gnu/libgmp.so.10+0xd86c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #3 0x7faf6db938d3 in __gmp_overflow_in_mpz (/lib/x86_64-linux-gnu/libgmp.so.10+0xd8d3) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #4 0x7faf6dbac95c in __gmpz_realloc (/lib/x86_64-linux-gnu/libgmp.so.10+0x2695c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #5 0x7faf6dba9038 in __gmpz_n_pow_ui (/lib/x86_64-linux-gnu/libgmp.so.10+0x23038) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #6 0x5565ae1ccd9f in zif_gmp_pow /home/dcarlier/Contribs/php-src/ext/gmp/gmp.c:1286 #7 0x5565aee96ea9 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:1312 #8 0x5565af144320 in execute_ex /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:56075 #9 0x5565af160f07 in zend_execute /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:60439 #10 0x5565aed6fafe in zend_execute_scripts /home/dcarlier/Contribs/php-src/Zend/zend.c:1842 #11 0x5565aeae70a8 in php_execute_script /home/dcarlier/Contribs/php-src/main/main.c:2578 #12 0x5565af532f4e in do_cli /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:964 #13 0x5565af535877 in main /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:1334 #14 0x7faf6c633d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #15 0x7faf6c633e24 in __libc_start_main_impl ../csu/libc-start.c:360 #16 0x5565adc04040 in _start (/home/dcarlier/Contribs/php-src/sapi/cli/php+0x2604040) (BuildId: 949049955bdf8b7197390b1978a1dfc3ef6fdf38) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: FPE nptl/pthread_kill.c:44 in __pthread_kill_implementation ==286922==ABORTING ```
even without sanitizers, it is reproducible but with the following ``` <?php $g = gmp_init(256); var_dump(gmp_pow($g, PHP_INT_MAX)); ``` we get this ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==286922==ERROR: AddressSanitizer: FPE on unknown address 0x03e8000460ca (pc 0x7faf6c69de5c bp 0x400000000000004 sp 0x7ffe9843c740 T0) #0 0x7faf6c69de5c in __pthread_kill_implementation nptl/pthread_kill.c:44 #1 0x7faf6c649c81 in __GI_raise ../sysdeps/posix/raise.c:26 #2 0x7faf6db9386c in __gmp_exception (/lib/x86_64-linux-gnu/libgmp.so.10+0xd86c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #3 0x7faf6db938d3 in __gmp_overflow_in_mpz (/lib/x86_64-linux-gnu/libgmp.so.10+0xd8d3) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #4 0x7faf6dbac95c in __gmpz_realloc (/lib/x86_64-linux-gnu/libgmp.so.10+0x2695c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #5 0x7faf6dba9038 in __gmpz_n_pow_ui (/lib/x86_64-linux-gnu/libgmp.so.10+0x23038) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #6 0x5565ae1ccd9f in zif_gmp_pow /home/dcarlier/Contribs/php-src/ext/gmp/gmp.c:1286 #7 0x5565aee96ea9 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:1312 #8 0x5565af144320 in execute_ex /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:56075 #9 0x5565af160f07 in zend_execute /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:60439 #10 0x5565aed6fafe in zend_execute_scripts /home/dcarlier/Contribs/php-src/Zend/zend.c:1842 #11 0x5565aeae70a8 in php_execute_script /home/dcarlier/Contribs/php-src/main/main.c:2578 #12 0x5565af532f4e in do_cli /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:964 #13 0x5565af535877 in main /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:1334 #14 0x7faf6c633d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #15 0x7faf6c633e24 in __libc_start_main_impl ../csu/libc-start.c:360 #16 0x5565adc04040 in _start (/home/dcarlier/Contribs/php-src/sapi/cli/php+0x2604040) (BuildId: 949049955bdf8b7197390b1978a1dfc3ef6fdf38) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: FPE nptl/pthread_kill.c:44 in __pthread_kill_implementation ==286922==ABORTING ``` close phpGH-16384
``` ext/gd/libgd/gd.c:2275:14: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' #0 0x5d6a2103e1db in php_gd_gdImageCopy /home/dcarlier/Contribs/php-src/ext/gd/libgd/gd.c:2275 #1 0x5d6a210a2b63 in gdImageCrop /home/dcarlier/Contribs/php-src/ext/gd/libgd/gd_crop.c:57 #2 0x5d6a21018ca4 in zif_imagecrop /home/dcarlier/Contribs/php-src/ext/gd/gd.c:3575 #3 0x5d6a21e46e7a in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:1337 #4 0x5d6a221188da in execute_ex /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:57246 #5 0x5d6a221366bd in zend_execute /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:61634 #6 0x5d6a21d107a6 in zend_execute_scripts /home/dcarlier/Contribs/php-src/Zend/zend.c:1895 #7 0x5d6a21a63409 in php_execute_script /home/dcarlier/Contribs/php-src/main/main.c:2529 #8 0x5d6a22516d5e in do_cli /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:966 #9 0x5d6a2251981d in main /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:1341 #10 0x7f10d002a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #11 0x7f10d002a47a in __libc_start_main_impl ../csu/libc-start.c:360 #12 0x5d6a20a06da4 in _start (/home/dcarlier/Contribs/php-src/sapi/cli/php+0x2806da4) (BuildId: d9a79c7e0e4872311439d7313cb3a81fe04190a2) ``` close phpGH-18006
This patch optimizes the
$x = array_function($x, ...)
pattern and the$x = array_function(temporary, ...)
pattern for these functions:array_merge, array_unique, array_replace, and array_intersect.
With these limitations:
comparison may throw.
recursive version may throw.
without holes.
It works by checking if the array function is immediately followed by an
assignment which overrides the input. In that case we can do the
operation in-place instead of copying the array. Note that this is
limited to CV's at the moment, and can't handle more complex scenarios
like array or object assignments.
For the temporary case it suffices to check if the refcount of the input
array is 1 and the array is non-persistent and non-immutable.
The current approach is a bit ugly though: it looks at the VM
instructions from within a function to check if the optimization is
possible, which is a bit odd.
I considered extending opcache as an alternative, but I believe this would
require adding a whole bunch of machinery for only a few users.
Looking at the assembly of prepare_in_place_array_modify_if_possible()
it looks pretty light-weight, about 95 bytes / 29 instructions on my
x86-64 Linux laptop.
Safety
There are some array functions which take some sort of copy of the input
array into a temporary C array for sorting.
(e.g. array_unique, array_diff, and array_intersect do this).
Since we no longer take a copy in all cases, we must check if it's
possible that a value is accessed that was already destroyed.
For array_unique: cmpdata will never be removed so that will never reach
refcount 0. And when something is removed, it is the previous value of
cmpdata, not the one user later. So this seems okay.
For array_intersect: a previous pointer (ptr[0] - 1) is accessed.
But this can't be a destroyed value because the pointer is first moved forward.
Results
Using this benchmark script
https://gist.github.com/nielsdos/ae5a2dddc53c61749ae31c908aa78e98 I get:
=== array_merge $a = array_merge($a, ...) ===
before 4.3821 sec
after 0.0022 sec
=== array_merge temporary ===
before 0.1265 sec
after 0.0479 sec
=== array_unique temporary ===
before 0.9297 sec
after 0.8498 sec
=== array_replace $a = array_replace($a, ...) ===
before 0.0810 sec
after 0.0083 sec
=== array_replace temporary ===
before 0.1261 sec
after 0.0534 sec
=== array_intersect temporary (no significant improvement because dominated by sorting) ===
before 30.499 sec
after 30.356 sec