-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Allow some array functions to operate in-place using a peep-hole optimization #11060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow some array functions to operate in-place using a peep-hole optimization #11060
Conversation
Thank you for looking into this. Also does this work for And can Also can the optimization be asserted by a test (with |
Not at the moment, but maybe in the future. The main goal of this PR is to show the idea works and to implement a first version of the idea. It can always be extended on in the future if desired. In any case I think it's important to first gather some feedback on this before continuing on.
No, because this PR optimizes the array functions, and what you show here must be implemented in the VM. I can look into the feasibility to optimize this next week (I'm quite busy now). But no promises on this one.
I don't think so tbh. The reason is that the indices of the array may change, so we'd have to move a whole bunch of memory anyway. At that point a copy is not much more expensive.
Kinda hard to do reliably tbh. I'd have to interpose the array duplication function to be able to count the amount of invocations. Getting this reliable is nasty. I'll first hear what other maintainers think about this before making more progress on this. |
Having this tested is very important, as user must be allowed to safely rely on the optimization. What about storing some flag, via |
The benchmark results looks promising, but implementation is hacky. Please, don't merge this yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, I think the support for CV is too risky.
Array arguments with refcount == 1 should work.
Probably the same may be done for string and then we may use zend_may_modify_arg_in_place()
or even simple macro.
@iliaal please also take a look
ext/standard/array.c
Outdated
} | ||
|
||
/* Must set the CV to NULL so we don't destroy the array on assignment */ | ||
ZVAL_NULL(var); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can var
be typed at this place? (typed reference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of exception the value of CV is going to have an improper value.
class Bomb {
function __destruct() {
throw new Exception("ops");
}
}
function foo() {
$a = range(1, 3);
try {
$a = array_merge($a, ["bomb" => new Bomb()], ["bomb" => 0]);
} catch (Exception $ex) {
echo "ops\n";
}
var_dump($a);
}
foo();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I posted wrong example above, and modified it in-place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can var be typed at this place? (typed reference)
I don't think this is a concern. array_merge
and array_merge_recursive
accept the parameter by-value. This means that we'll get a refcount of 2 for the array (while keeping the reference at 1), at which point this optimization will not apply.
In case of exception the value of CV is going to have an improper value.
This is a valid concern, we thought about coercion warnings but missed destructor side-effects. I can't think of a way to avoid this.
|
||
if (in_place) { | ||
GC_ADDREF(dest); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume, you need this to avoid destruction in zend_vm_stack_free_args().
You may also try ZVAL_NULL()
or ZVAL_UNDEF()
of the arg here or in the first place (instead of setting in_place
). This may only change the argument value in backtrace.
ADDREF is safer.
ext/standard/array.c
Outdated
static bool set_return_value_array_dup_or_in_place(const zend_execute_data *execute_data, const zval *arg, zval *return_value, bool safe_for_CV) | ||
{ | ||
if (prepare_in_place_array_modify_if_possible(execute_data, arg, safe_for_CV)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names are really bad. I would propose to use zend_may_modify_array_in_place()
or zend_modify_array_in_place()
instead of set_return_value_array_dup_or_in_place()
and always expand set_return_value_array_dup_or_in_place()
a0b7278
to
bb5deb3
Compare
Okay thanks for reviewing, I made sure the optimization only applies to RC1 inputs and I changed the name for the helper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK now.
ext/standard/array.c
Outdated
static zend_always_inline bool zend_may_modify_array_in_place(const zval *arg) | ||
{ | ||
return !(GC_FLAGS(Z_ARRVAL_P(arg)) & (IS_ARRAY_IMMUTABLE | IS_ARRAY_PERSISTENT)) && Z_REFCOUNT_P(arg) == 1; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to add assertion ZEND_ASSERT(Z_TYPE_P(arg) == IS_ARRAY)
Maybe it makes sense to make this function even more general.
static zend_always_inline bool zend_may_modify_arg_in_place(const zval *arg)
{
return Z_REFCOUNTED_P(arg) && !(GC_FLAGS(Z_ARRVAL_P(arg)) & (GC_IMMUTABLE | GC_PERSISTENT)) && Z_REFCOUNT_P(arg) == 1;
}
This may be moved to zend_types.h
array_init_size(return_value, count); | ||
dest = Z_ARRVAL_P(return_value); | ||
/* copy first array if necessary */ | ||
bool in_place = false; | ||
if (HT_IS_PACKED(src)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is non packed array? It does not need in-place trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For packed arrays, all elements are after each other without holes. Duplicating it won't change the memory layout. However, for other cases the memory layout can be different.
@@ -4764,6 +4813,7 @@ static void php_array_intersect(INTERNAL_FUNCTION_PARAMETERS, int behavior, int | |||
zend_fcall_info *fci_key = NULL, *fci_data; | |||
zend_fcall_info_cache *fci_key_cache = NULL, *fci_data_cache; | |||
PHP_ARRAY_CMP_FUNC_VARS; | |||
bool in_place = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can array_diff
be optimized the same way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check my commit comment to see why this can't be done easily in a safe manner.
@dstogov as discussed in #11060 (comment), do you think this can be tested easily with some help from |
We don't need |
@dstogov I meant the in-place optimization:
(and without using time measurement preferably) |
We don't prove performance with tests. It's possible to provide |
I would really like the optimization to be tested as the performance difference is dramatic. Do you think it is possible to save some flag in the original zval and test (after array_merge is called) if the zval flag is present in the assigned variable (if the flag is present, in-place was used). Or simply save |
Thanks for reviewing I'll apply your suggestion. @mvorisek Note that the latest version of the patch only performs the optimisation for temporaries. I dropped the |
This allows array_merge, array_intersect, array_replace, array_unique and usort to avoid taking a copy and do the transformation in-place. ** Safety ** There are some array functions which take a copy of the input array into a temporary C array for sorting purposes. (e.g. array_unique, array_diff, and array_intersect do this). Since we no longer take a copy in all cases, we must check if it's possible that a value is accessed that was already destroyed. For array_unique: cmpdata will never be removed so that will never reach refcount 0. And when something is removed, it is the previous value of cmpdata, not the one user later. So this seems okay. For array_intersect: a previous pointer (ptr[0] - 1) is accessed. But this can't be a destroyed value because the pointer is first moved forward. For array_diff: it's possible a previous pointer is accessed after destruction. So we can't optimise this case easily.
bb5deb3
to
ea6410c
Compare
## Introduction This implements a more generic version of the in-place modification I first tried in phpGH-11060. We decided to limit that PR to RC1 optimisations of some array functions only, and not do the in-place $variable optimisation in that way because of issues. This patch overcomes those issues and builds on the previous one. With this patch, any internal function that supports RC1 optimisations automatically gets the optimisation for in-place variable modifications. Contrary to the previous approach, this is compatible with exceptions. Furthermore, this approach also allows userland functions to benefit from this optimisation. e.g. the following code will not take a copy of the array with this patch: ``` function foo($array) { $array[1] = 1; } function bar() { $array = ...; $array = foo($array); } ``` Right now the impact on the benchmark suite isn't that high. The reason is that only a handful of functions within PHP optimise for RC1 cases. When more support for these cases are added, the benefit from this patch will increase. I've added a micro benchmark for array operations that shows the effect of this optimisation. ## Implementation The optimiser already tracks which SSA variables have a value that doesn't matter with the no_val field. By changing ZEND_SEND_VAR to redefine op1, we automatically know if the variable will ever be used again without being overwritten by looking at the no_val field. If the no_val field is set, the variable may hold a string/array and the refcount may be 1, we set a flag on the ZEND_SEND_VAR(_EX) opline to indicate that it may avoid a copy. The flag is stored in extended_value. There are two new VM type spec handlers for this that check for the flag: one for ZEND_SEND_VAR and one for ZEND_SEND_VAR_EX. ## Limitations * The optimisation isn't performed on arguments. This is because the optimisation would be externally visible, which is undesirable. Unfortunately, this is also the case where a lot of optimisation opportunity lies. Nonetheless, even with this limitation it seems like it can help a lot. * The optimisation does not apply to functions using indirect variable access (e.g. variable-variables, compact()) and vararg functions.
## Introduction This implements a more generic version of the in-place modification I first tried in phpGH-11060. We decided to limit that PR to RC1 optimisations of some array functions only, and not do the in-place $variable optimisation in that way because of issues. This patch overcomes those issues and builds on the previous one. With this patch, any internal function that supports RC1 optimisations automatically gets the optimisation for in-place variable modifications. Contrary to the previous approach, this is compatible with exceptions. Furthermore, this approach also allows userland functions to benefit from this optimisation. e.g. the following code will not take a copy of the array with this patch: ``` function foo($array) { $array[1] = 1; } function bar() { $array = ...; $array = foo($array); } ``` Right now the impact on the benchmark suite isn't that high. The reason is that only a handful of functions within PHP optimise for RC1 cases, and the array sizes for those cases are fairly small. When more support for these cases are added, the benefit from this patch will increase. I've added a micro benchmark for array operations that shows the effect of this optimisation. ## Implementation The optimiser already tracks which SSA variables have a value that doesn't matter with the no_val field. By changing ZEND_SEND_VAR to redefine op1, we automatically know if the variable will ever be used again without being overwritten by looking at the no_val field. If the no_val field is set, the variable may hold a string/array and the refcount may be 1, we set a flag on the ZEND_SEND_VAR(_EX) opline to indicate that it may avoid a copy. The flag is stored in extended_value. There are two new VM type spec handlers for this that check for the flag: one for ZEND_SEND_VAR and one for ZEND_SEND_VAR_EX. ## Limitations * The optimisation isn't performed on arguments. This is because the optimisation would be externally visible, which is undesirable. Unfortunately, this is also the case where a lot of optimisation opportunity lies. Nonetheless, even with this limitation it seems like it can help a lot. * The optimisation does not apply to functions using indirect variable access (e.g. variable-variables, compact()) and vararg functions.
## Introduction This implements a more generic version of the in-place modification I first tried in phpGH-11060. We decided to limit that PR to RC1 optimisations of some array functions only, and not do the in-place $variable optimisation in that way because of issues. This patch overcomes those issues and builds on the previous one. With this patch, any internal function that supports RC1 optimisations automatically gets the optimisation for in-place variable modifications. Contrary to the previous approach, this is compatible with exceptions. Furthermore, this approach also allows userland functions to benefit from this optimisation. e.g. the following code will not take a copy of the array with this patch, whereas previously it would due to the copy-on-write characteristic of arrays: ``` function foo($array) { $array[1] = 1; } function bar() { $array = ...; $array = foo($array); } ``` Right now the impact on the benchmark suite isn't that high. The reason is that only a handful of functions within PHP optimise for RC1 cases, and the array sizes for those cases are fairly small. When more support for these cases are added, the benefit from this patch will increase. I've added a micro benchmark for array operations that shows the effect of this optimisation. ## Implementation The optimiser already tracks which SSA variables have a value that doesn't matter with the no_val field. By changing ZEND_SEND_VAR to redefine op1, we automatically know if the variable will ever be used again without being overwritten by looking at the no_val field. If the no_val field is set, the variable may hold a string/array and the refcount may be 1, we set a flag on the ZEND_SEND_VAR(_EX) opline to indicate that it may avoid a copy. The flag is stored in extended_value. There are two new VM type spec handlers for this that check for the flag: one for ZEND_SEND_VAR and one for ZEND_SEND_VAR_EX. ## Limitations * The optimisation isn't performed on arguments passed to other functions. This is because the optimisation would be externally visible through backtraces, which is undesirable. Unfortunately, this is also the case where a lot of optimisation opportunity lies. Nonetheless, even with this limitation it seems like the optimisation can help a lot. * The optimisation does not apply to functions using indirect variable access (e.g. variable-variables, compact()) and vararg functions.
Current version of patch
Old version of patch
This patch optimizes the
$x = array_function($x, ...)
pattern and the$x = array_function(temporary, ...)
pattern for these functions:array_merge, array_unique, array_replace, and array_intersect.
With these limitations:
comparison may throw.
recursive version may throw.
without holes.
It works by checking if the array function is immediately followed by an
assignment which overrides the input. In that case we can do the
operation in-place instead of copying the array. Note that this is
limited to CV's at the moment, and can't handle more complex scenarios
like array or object assignments.
For the temporary case it suffices to check if the refcount of the input
array is 1 and the array is non-persistent and non-immutable.
The current approach is a bit ugly though: it looks at the VM
instructions from within a function to check if the optimization is
possible, which is a bit odd.
I considered extending opcache as an alternative, but I believe this would
require adding a whole bunch of machinery for only a few users.
Looking at the assembly of prepare_in_place_array_modify_if_possible()
it looks pretty light-weight, about 95 bytes / 29 instructions on my
x86-64 Linux laptop.
Safety
There are some array functions which take some sort of copy of the input
array into a temporary C array for sorting.
(e.g. array_unique, array_diff, and array_intersect do this).
Since we no longer take a copy in all cases, we must check if it's
possible that a value is accessed that was already destroyed.
For array_unique: cmpdata will never be removed so that will never reach
refcount 0. And when something is removed, it is the previous value of
cmpdata, not the one user later. So this seems okay.
For array_intersect: a previous pointer (ptr[0] - 1) is accessed.
But this can't be a destroyed value because the pointer is first moved forward.
Results
Using this benchmark script
https://gist.github.com/nielsdos/ae5a2dddc53c61749ae31c908aa78e98 I get:
=== array_merge $a = array_merge($a, ...) ===
before 4.3821 sec
after 0.0022 sec
=== array_merge temporary ===
before 0.1265 sec
after 0.0479 sec
=== array_unique temporary ===
before 0.9297 sec
after 0.8498 sec
=== array_replace $a = array_replace($a, ...) ===
before 0.0810 sec
after 0.0083 sec
=== array_replace temporary ===
before 0.1261 sec
after 0.0534 sec
=== array_intersect temporary (no significant improvement because dominated by sorting) ===
before 30.499 sec
after 30.356 sec