Improvements to for-each implementation
Hi,
this post is a fork of the "[PHP-DEV] Fixing strange foreach behavior" thread. It proposes
a more efficient for-each mechanism (that does NOT change the conceptual behaviour).
Currently on for-each the engine will have to copy the array if that array is visible anywhere else
in the program because it will reset the internal position pointer (which is part of the underlying
hashtable structure) and another part of the program might rely on it.
Essentially the array gets duplicated prematurely, only because of the internal position pointer. Of
course it might have to anyways be duplicated within the for-each loop, but if (any only if) it is
actually altered. In most cases one just iterates over without altering. Please consider the
following sample, taken from my recent post:
$arr = $obj->arr; // property "arr" is an array
foreach ($arr as $val) ...;
This will currently copy the array, because $arr is also visible through $obj->arr although this
is not really necessary unless the array is actually changed during iteration.
If one would use an external position variable that is initialized in FE_RESET (TEMPVAR) and then
incremented in FE_FETCH one could just increment the ref_count of the array while being traversed
without the initial need to perform copy-on-write.
Now, if the hashtable is in any way altered during the traversal then the usual copy-on-write would
kick in because for-each initialization made sure that ref_count was incremented before starting
traversal. In that case PHP would - just like currently - have to duplicate, but only on first
actual alteration, not prematurely on for-each initialization.
So in 90% (just a guess) of the cases, when you just traverse without altering you get the full
benefit of no-copy-necessary, while in the other cases you will basically have the previous
performance penalty of duplication, but at least postponed to the first alteration (which might be
inside a branch that is not even taken).
Nested for-each loops would not have to revert to copy-on-write either, because they have their own
pointer.
This would effectively speed up most for-each operations and would have the extra benefit of not
having to store an internal pointer in the hashtable structure.
Please let me know your thoughts!
Cheers,
Ben
--
Benjamin Coutu
Zeyon Technologies Inc.
http://www.zeyos.com
Thread (14 messages)