Skip to content

Allow some array functions to operate in-place using a peep-hole optimization #11060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2023

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Apr 11, 2023

Current version of patch

Allow array functions to operate in-place if the refcount is 1

This allows array_merge, array_intersect, array_replace, array_unique
and usort to avoid taking a copy and do the transformation in-place.

** Safety **

There are some array functions which take a copy of the input
array into a temporary C array for sorting purposes.
(e.g. array_unique, array_diff, and array_intersect do this).
Since we no longer take a copy in all cases, we must check if
it's possible that a value is accessed that was already destroyed.

For array_unique: cmpdata will never be removed so that will never reach
refcount 0. And when something is removed, it is the previous value of
cmpdata, not the one user later. So this seems okay.

For array_intersect: a previous pointer (ptr[0] - 1) is accessed.
But this can't be a destroyed value because the pointer is first moved forward.

For array_diff: it's possible a previous pointer is accessed after
destruction. So we can't optimise this case easily.

Old version of patch

This patch optimizes the $x = array_function($x, ...) pattern and the
$x = array_function(temporary, ...) pattern for these functions:
array_merge, array_unique, array_replace, and array_intersect.
With these limitations:

  • array_{unique,intersect} only do the temporary optimization because the
    comparison may throw.
  • array_{merge,replace} only optimizes CVs for non-recursive case because the
    recursive version may throw.
  • array_merge optimization works only if the array is packed and is
    without holes.

It works by checking if the array function is immediately followed by an
assignment which overrides the input. In that case we can do the
operation in-place instead of copying the array. Note that this is
limited to CV's at the moment, and can't handle more complex scenarios
like array or object assignments.
For the temporary case it suffices to check if the refcount of the input
array is 1 and the array is non-persistent and non-immutable.

The current approach is a bit ugly though: it looks at the VM
instructions from within a function to check if the optimization is
possible, which is a bit odd.
I considered extending opcache as an alternative, but I believe this would
require adding a whole bunch of machinery for only a few users.

Looking at the assembly of prepare_in_place_array_modify_if_possible()
it looks pretty light-weight, about 95 bytes / 29 instructions on my
x86-64 Linux laptop.

Safety

There are some array functions which take some sort of copy of the input
array into a temporary C array for sorting.
(e.g. array_unique, array_diff, and array_intersect do this).
Since we no longer take a copy in all cases, we must check if it's
possible that a value is accessed that was already destroyed.

For array_unique: cmpdata will never be removed so that will never reach
refcount 0. And when something is removed, it is the previous value of
cmpdata, not the one user later. So this seems okay.

For array_intersect: a previous pointer (ptr[0] - 1) is accessed.
But this can't be a destroyed value because the pointer is first moved forward.

Results

Using this benchmark script
https://gist.github.com/nielsdos/ae5a2dddc53c61749ae31c908aa78e98 I get:

=== array_merge $a = array_merge($a, ...) ===
before 4.3821 sec
after 0.0022 sec
=== array_merge temporary ===
before 0.1265 sec
after 0.0479 sec
=== array_unique temporary ===
before 0.9297 sec
after 0.8498 sec
=== array_replace $a = array_replace($a, ...) ===
before 0.0810 sec
after 0.0083 sec
=== array_replace temporary ===
before 0.1261 sec
after 0.0534 sec
=== array_intersect temporary (no significant improvement because dominated by sorting) ===
before 30.499 sec
after 30.356 sec

@mvorisek
Copy link
Contributor

mvorisek commented Apr 11, 2023

Note that this is limited to CV's at the moment, and can't handle more complex scenarios like array or object assignments.

Thank you for looking into this. $x[$i] = array_merge($x[$i], expr) is quite common, can it be optimized as well? And $x->prop = array_merge($x->prop, expr)?

Also does this work for $x = [...$x, expr] (array spread)?

And can $x = array_merge(expr, $x) (array merge, but prepend the new values) be optimized as well?

Also can the optimization be asserted by a test (with zend_test ext if needed)?

@nielsdos
Copy link
Member Author

Note that this is limited to CV's at the moment, and can't handle more complex scenarios like array or object assignments.

Thank you for looking into this. $x[$i] = array_merge($x[$i], expr) is quite common, can it be optimized as well?

Not at the moment, but maybe in the future. The main goal of this PR is to show the idea works and to implement a first version of the idea. It can always be extended on in the future if desired. In any case I think it's important to first gather some feedback on this before continuing on.

Also does this work for $x = [...$x, expr] (array spread)?

No, because this PR optimizes the array functions, and what you show here must be implemented in the VM. I can look into the feasibility to optimize this next week (I'm quite busy now). But no promises on this one.

And can $x = array_merge(expr, $x) (array merge, but prepend the new values) be optimized as well?

I don't think so tbh. The reason is that the indices of the array may change, so we'd have to move a whole bunch of memory anyway. At that point a copy is not much more expensive.

Also can the optimization be asserted by a test (with zend_test ext if needed)?

Kinda hard to do reliably tbh. I'd have to interpose the array duplication function to be able to count the amount of invocations. Getting this reliable is nasty. I'll first hear what other maintainers think about this before making more progress on this.

@mvorisek
Copy link
Contributor

mvorisek commented Apr 12, 2023

test

Having this tested is very important, as user must be allowed to safely rely on the optimization. What about storing some flag, via zend_test, into the source zval and testing the present/non-presence of that flag in the result?

@dstogov
Copy link
Member

dstogov commented Apr 12, 2023

The benchmark results looks promising, but implementation is hacky.
I made only a quick look. I'll try to find more time to think how to implement this in a more general and better way.
Unfortunately, I hardly ever will have time before Monday.

Please, don't merge this yet.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, I think the support for CV is too risky.
Array arguments with refcount == 1 should work.
Probably the same may be done for string and then we may use zend_may_modify_arg_in_place() or even simple macro.

@iliaal please also take a look

}

/* Must set the CV to NULL so we don't destroy the array on assignment */
ZVAL_NULL(var);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can var be typed at this place? (typed reference)

Copy link
Member

@dstogov dstogov Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of exception the value of CV is going to have an improper value.

class Bomb {
    function __destruct() {
      throw new Exception("ops");
    }
}
function foo() {
    $a = range(1, 3);
    try {
        $a = array_merge($a, ["bomb" => new Bomb()], ["bomb" => 0]);
    } catch (Exception $ex) {
        echo "ops\n";
    }
    var_dump($a);
}
foo();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I posted wrong example above, and modified it in-place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can var be typed at this place? (typed reference)

I don't think this is a concern. array_merge and array_merge_recursive accept the parameter by-value. This means that we'll get a refcount of 2 for the array (while keeping the reference at 1), at which point this optimization will not apply.

In case of exception the value of CV is going to have an improper value.

This is a valid concern, we thought about coercion warnings but missed destructor side-effects. I can't think of a way to avoid this.

Comment on lines +3918 to +3902

if (in_place) {
GC_ADDREF(dest);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume, you need this to avoid destruction in zend_vm_stack_free_args().
You may also try ZVAL_NULL() or ZVAL_UNDEF() of the arg here or in the first place (instead of setting in_place). This may only change the argument value in backtrace.

ADDREF is safer.

Comment on lines 3870 to 3872
static bool set_return_value_array_dup_or_in_place(const zend_execute_data *execute_data, const zval *arg, zval *return_value, bool safe_for_CV)
{
if (prepare_in_place_array_modify_if_possible(execute_data, arg, safe_for_CV)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are really bad. I would propose to use zend_may_modify_array_in_place() or zend_modify_array_in_place() instead of set_return_value_array_dup_or_in_place() and always expand set_return_value_array_dup_or_in_place()

@nielsdos nielsdos force-pushed the in-place-array-modification-squashed branch from a0b7278 to bb5deb3 Compare April 23, 2023 20:52
@nielsdos
Copy link
Member Author

nielsdos commented Apr 23, 2023

Okay thanks for reviewing, I made sure the optimization only applies to RC1 inputs and I changed the name for the helper.
I plan on doing a follow-up sometime in the near future that will allow us to do some optimization for CV as well, but that will rely on this patch.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK now.

Comment on lines 94 to 98
static zend_always_inline bool zend_may_modify_array_in_place(const zval *arg)
{
return !(GC_FLAGS(Z_ARRVAL_P(arg)) & (IS_ARRAY_IMMUTABLE | IS_ARRAY_PERSISTENT)) && Z_REFCOUNT_P(arg) == 1;
}

Copy link
Member

@dstogov dstogov Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to add assertion ZEND_ASSERT(Z_TYPE_P(arg) == IS_ARRAY)

Maybe it makes sense to make this function even more general.

static zend_always_inline bool zend_may_modify_arg_in_place(const zval *arg)
{
	return Z_REFCOUNTED_P(arg) && !(GC_FLAGS(Z_ARRVAL_P(arg)) & (GC_IMMUTABLE | GC_PERSISTENT)) && Z_REFCOUNT_P(arg) == 1;
}

This may be moved to zend_types.h

array_init_size(return_value, count);
dest = Z_ARRVAL_P(return_value);
/* copy first array if necessary */
bool in_place = false;
if (HT_IS_PACKED(src)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is non packed array? It does not need in-place trick?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For packed arrays, all elements are after each other without holes. Duplicating it won't change the memory layout. However, for other cases the memory layout can be different.

@@ -4764,6 +4813,7 @@ static void php_array_intersect(INTERNAL_FUNCTION_PARAMETERS, int behavior, int
zend_fcall_info *fci_key = NULL, *fci_data;
zend_fcall_info_cache *fci_key_cache = NULL, *fci_data_cache;
PHP_ARRAY_CMP_FUNC_VARS;
bool in_place = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can array_diff be optimized the same way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check my commit comment to see why this can't be done easily in a safe manner.

@mvorisek
Copy link
Contributor

@dstogov as discussed in #11060 (comment), do you think this can be tested easily with some help from zend_test?

@dstogov
Copy link
Member

dstogov commented Apr 24, 2023

@dstogov as discussed in #11060 (comment), do you think this can be tested easily with some help from zend_test?

We don't need zend_test to test this. usort($a1 + $a2, "cmp"); or something similar is enough. I think, this should be already covered.

@mvorisek
Copy link
Contributor

mvorisek commented Apr 24, 2023

@dstogov as discussed in #11060 (comment), do you think this can be tested easily with some help from zend_test?

We don't need zend_test to test this. usort($a1 + $a2, "cmp"); or something similar is enough. I think, this should be already covered.

@dstogov I meant the in-place optimization:

=== array_merge $a = array_merge($a, ...) ===
before 4.3821 sec
after 0.0022 sec

(and without using time measurement preferably)

@dstogov
Copy link
Member

dstogov commented Apr 24, 2023

We don't prove performance with tests. It's possible to provide array_micro_bench.php similar to micro_bench.php.

@mvorisek
Copy link
Contributor

I would really like the optimization to be tested as the performance difference is dramatic. Do you think it is possible to save some flag in the original zval and test (after array_merge is called) if the zval flag is present in the assigned variable (if the flag is present, in-place was used). Or simply save zval C ptr before/after and compare it.

@nielsdos
Copy link
Member Author

Thanks for reviewing I'll apply your suggestion.

@mvorisek Note that the latest version of the patch only performs the optimisation for temporaries. I dropped the $variable = f($variable) optimization because it had complications and wasn't safe.
I'm working on a new approach for the variable in-place modification in my fork. That new approach needs this PR first. The new approach also generalises to userland functions.
I think I can make a bench test file on my fork, it may make sense to do that there.

This allows array_merge, array_intersect, array_replace, array_unique
and usort to avoid taking a copy and do the transformation in-place.

** Safety **

There are some array functions which take a copy of the input
array into a temporary C array for sorting purposes.
(e.g. array_unique, array_diff, and array_intersect do this).
Since we no longer take a copy in all cases, we must check if
it's possible that a value is accessed that was already destroyed.

For array_unique: cmpdata will never be removed so that will never reach
refcount 0. And when something is removed, it is the previous value of
cmpdata, not the one user later. So this seems okay.

For array_intersect: a previous pointer (ptr[0] - 1) is accessed.
But this can't be a destroyed value because the pointer is first moved forward.

For array_diff: it's possible a previous pointer is accessed after
destruction. So we can't optimise this case easily.
@nielsdos nielsdos force-pushed the in-place-array-modification-squashed branch from bb5deb3 to ea6410c Compare April 24, 2023 20:55
@nielsdos nielsdos merged commit 947eb95 into php:master Apr 24, 2023
nielsdos added a commit to nielsdos/php-src that referenced this pull request Apr 28, 2023
## Introduction

This implements a more generic version of the in-place modification I
first tried in phpGH-11060. We decided to limit that PR to RC1
optimisations of some array functions only, and not do the in-place
$variable optimisation in that way because of issues.

This patch overcomes those issues and builds on the previous one.
With this patch, any internal function that supports RC1 optimisations
automatically gets the optimisation for in-place variable modifications.
Contrary to the previous approach, this is compatible with exceptions.
Furthermore, this approach also allows userland functions to benefit
from this optimisation.
e.g. the following code will not take a copy of the array with this
patch:
```
function foo($array) {
  $array[1] = 1;
}
function bar() {
  $array = ...;
  $array = foo($array);
}
```

Right now the impact on the benchmark suite isn't that high. The reason
is that only a handful of functions within PHP optimise for RC1 cases.
When more support for these cases are added, the benefit from this patch
will increase.

I've added a micro benchmark for array operations that shows the effect
of this optimisation.

## Implementation

The optimiser already tracks which SSA variables have a value that
doesn't matter with the no_val field. By changing ZEND_SEND_VAR to
redefine op1, we automatically know if the variable will ever be used
again without being overwritten by looking at the no_val field.
If the no_val field is set, the variable may hold a string/array and the
refcount may be 1, we set a flag on the ZEND_SEND_VAR(_EX) opline to
indicate that it may avoid a copy. The flag is stored in extended_value.

There are two new VM type spec handlers for this that check for the
flag: one for ZEND_SEND_VAR and one for ZEND_SEND_VAR_EX.

## Limitations

* The optimisation isn't performed on arguments. This is because the
  optimisation would be externally visible, which is undesirable.
  Unfortunately, this is also the case where a lot of optimisation
  opportunity lies. Nonetheless, even with this limitation it seems like
  it can help a lot.
* The optimisation does not apply to functions using indirect variable
  access (e.g. variable-variables, compact()) and vararg functions.
nielsdos added a commit to nielsdos/php-src that referenced this pull request Apr 28, 2023
## Introduction

This implements a more generic version of the in-place modification I
first tried in phpGH-11060. We decided to limit that PR to RC1
optimisations of some array functions only, and not do the in-place
$variable optimisation in that way because of issues.

This patch overcomes those issues and builds on the previous one.
With this patch, any internal function that supports RC1 optimisations
automatically gets the optimisation for in-place variable modifications.
Contrary to the previous approach, this is compatible with exceptions.
Furthermore, this approach also allows userland functions to benefit
from this optimisation.
e.g. the following code will not take a copy of the array with this
patch:
```
function foo($array) {
  $array[1] = 1;
}
function bar() {
  $array = ...;
  $array = foo($array);
}
```

Right now the impact on the benchmark suite isn't that high. The reason
is that only a handful of functions within PHP optimise for RC1 cases, and
the array sizes for those cases are fairly small.
When more support for these cases are added, the benefit from this patch
will increase.

I've added a micro benchmark for array operations that shows the effect
of this optimisation.

## Implementation

The optimiser already tracks which SSA variables have a value that
doesn't matter with the no_val field. By changing ZEND_SEND_VAR to
redefine op1, we automatically know if the variable will ever be used
again without being overwritten by looking at the no_val field.
If the no_val field is set, the variable may hold a string/array and the
refcount may be 1, we set a flag on the ZEND_SEND_VAR(_EX) opline to
indicate that it may avoid a copy. The flag is stored in extended_value.

There are two new VM type spec handlers for this that check for the
flag: one for ZEND_SEND_VAR and one for ZEND_SEND_VAR_EX.

## Limitations

* The optimisation isn't performed on arguments. This is because the
  optimisation would be externally visible, which is undesirable.
  Unfortunately, this is also the case where a lot of optimisation
  opportunity lies. Nonetheless, even with this limitation it seems like
  it can help a lot.
* The optimisation does not apply to functions using indirect variable
  access (e.g. variable-variables, compact()) and vararg functions.
nielsdos added a commit to nielsdos/php-src that referenced this pull request Apr 30, 2023
## Introduction

This implements a more generic version of the in-place modification I
first tried in phpGH-11060. We decided to limit that PR to RC1
optimisations of some array functions only, and not do the in-place
$variable optimisation in that way because of issues.

This patch overcomes those issues and builds on the previous one.
With this patch, any internal function that supports RC1 optimisations
automatically gets the optimisation for in-place variable modifications.
Contrary to the previous approach, this is compatible with exceptions.
Furthermore, this approach also allows userland functions to benefit
from this optimisation.

e.g. the following code will not take a copy of the array with this
patch, whereas previously it would due to the copy-on-write
characteristic of arrays:
```
function foo($array) {
  $array[1] = 1;
}
function bar() {
  $array = ...;
  $array = foo($array);
}
```

Right now the impact on the benchmark suite isn't that high. The reason
is that only a handful of functions within PHP optimise for RC1 cases, and
the array sizes for those cases are fairly small.
When more support for these cases are added, the benefit from this patch
will increase.

I've added a micro benchmark for array operations that shows the effect
of this optimisation.

## Implementation

The optimiser already tracks which SSA variables have a value that
doesn't matter with the no_val field. By changing ZEND_SEND_VAR to
redefine op1, we automatically know if the variable will ever be used
again without being overwritten by looking at the no_val field.
If the no_val field is set, the variable may hold a string/array and the
refcount may be 1, we set a flag on the ZEND_SEND_VAR(_EX) opline to
indicate that it may avoid a copy. The flag is stored in extended_value.

There are two new VM type spec handlers for this that check for the
flag: one for ZEND_SEND_VAR and one for ZEND_SEND_VAR_EX.

## Limitations

* The optimisation isn't performed on arguments passed to other
  functions. This is because the optimisation would be externally
  visible through backtraces, which is undesirable.
  Unfortunately, this is also the case where a lot of optimisation
  opportunity lies. Nonetheless, even with this limitation it seems like
  the optimisation can help a lot.
* The optimisation does not apply to functions using indirect variable
  access (e.g. variable-variables, compact()) and vararg functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants