Skip to content

str_starts_with slower than userland #18474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jdarwood007 opened this issue May 1, 2025 · 17 comments
Open

str_starts_with slower than userland #18474

jdarwood007 opened this issue May 1, 2025 · 17 comments

Comments

@jdarwood007
Copy link

Description

Firstly, apologies if this is the wrong place.

I was reviewing some code going into a project, and it used str_starts_with. Being new to using it myself wondered about its performance compared to other options and suspected that it should outperform any userland functions. Knowing that if it performed better, it would make a use case for a future micro-optimization PR in the project.

I shamelessly borrowed some Stack Overflow code and added str_starts_with into the benchmarking script and ran it
https://gist.github.com/jdarwood007/1a949424f8cf85ca1b5f66ce38527eb6

PHP 8.4 results

generating tests........................................done!
strncmp_startswith2: 49.7 ms
strncmp_startswith: 88.3 ms
substr_compare_startswith: 89.7 ms
str_starts_with: 133.4 ms
substr_startswith: 156.3 ms
strpos_startswith: 201.2 ms
preg_match_startswith: 15,547.4 ms

PHP 8.3 had similar results.
8.2 had improved results across the board on all tests except preg_match, but I only have a limited set of systems to test with. Results were in the same order.

Is this something that could be expected to be looked into, and see if performance improvements can be made? Seeing userland functions outperform the native functions seems to indicate to me that there could be some improvements

PHP Version

PHP 8.4.5 (cli) (built: Mar 13 2025 15:36:20) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.4.5, Copyright (c) Zend Technologies
with Zend OPcache v8.4.5, Copyright (c), by Zend Technologies
with Xdebug v3.4.2, Copyright (c) 2002-2025, by Derick Rethans

Operating System

Ubuntu 24.04, MacOS 12.7

@Girgias
Copy link
Member

Girgias commented May 1, 2025

Possibly related to #18204

@alecpl
Copy link

alecpl commented May 1, 2025

Interesting that on PHP 8.2 the results for str_starts_with aren't that bad, it's on the 2nd position (loosing not much). But on PHP 8.4 (while all cases got much faster) str_starts_with fell to 4th position (loosing more). strncmp_startswith2 is the winner for both.

Fedora 42 x86_64 (64-bit).

@nielsdos
Copy link
Member

nielsdos commented May 1, 2025

Alright, let's have a look...

Possibly related to #18204

I don't think so, the implementation only relies on engine API and doesn't call into userland, it should be easy to beat a PHP implementation normally.

@nielsdos
Copy link
Member

nielsdos commented May 1, 2025

First of all, I can't fully reproduce the results you get.
I get the following:

strncmp_startswith2: 15.2 ms
str_starts_with: 17.5 ms
substr_compare_startswith: 20.1 ms
strncmp_startswith: 20.7 ms
substr_startswith: 28.6 ms
strpos_startswith: 97.2 ms
preg_match_startswith: 12,074.9 ms

If I then remove the pcre one, I get this

str_starts_with: 18.5 ms
strncmp_startswith2: 18.6 ms
strncmp_startswith: 19.5 ms
substr_compare_startswith: 21.4 ms
substr_startswith: 29.2 ms
strpos_startswith: 98.8 ms

Possibly caching/throttling related.
Anyway, the difference is marginal.

Second, the benchmarking code is likely flawed.
If you take a look at a profiler, you'll see that zflf_strpos_2 takes 5.36% of runtime, and the next one is zflf_substr_3 with 0.73%, ... etc. So you're barely measuring the execution time of these functions. What you're mostly measuring is the overhead of doing dynamic calls to various functions and also a bit constructs like argument unpacking.
When you write benchmarking code, make sure you only actually do the calls to these functions, no fancy constructs. And also that you do enough work that the function call overhead is less than the function execution.

Rewriting the code like this: https://gist.github.com/nielsdos/c5525659ecad22074b41b1acd2bccac1
Yields:

str_starts_with: 15.8 ms
strncmp_startswith: 17.6 ms
strncmp_startswith2: 18.0 ms
substr_compare_startswith: 18.3 ms
substr_startswith: 25.9 ms
strpos_startswith: 93.7 ms

Suggesting that the overhead of the call is still drowning the execution time of the funtion.
That said, by looking at the profile, str_starts_with takes the least amount of time and that's confirmed by the results here.

@lucasnetau
Copy link
Contributor

The original report appears to have xdebug enabled, I get similar results to the original report with this enabled using the updated bench version on MacOS.

When testing with xdebug not loaded I do see something interesting with timing.

PHP 8.4.6 (cli)
Zend Engine v4.4.6
    with Zend OPcache v8.4.6,

On Linux (compiled with gcc) I don't get much of a different between the calls, the same as the prev comment.

strncmp_startswith: 24.3 ms
strncmp_startswith2: 24.6 ms
str_starts_with: 25.0 ms
substr_compare_startswith: 29.4 ms
substr_startswith: 33.9 ms
strpos_startswith: 136.2 ms

On MacOS (compiled with clang) str_starts_with drops down the ladder for execution time and it always slower.

strncmp_startswith2: 21.8 ms
strncmp_startswith: 24.0 ms
substr_compare_startswith: 29.4 ms
substr_startswith: 32.7 ms
str_starts_with: 43.1 ms
strpos_startswith: 111.9 ms

I did notice that strncmp is defined with fastcall vs str_starts_with with zend_inline_always, I don't know if clang has different results than GCC for those two. Of course this is only a single call of each function, however the timing spread was the same on multiple calls of the bench script.

@nielsdos
Copy link
Member

nielsdos commented May 1, 2025

str_starts_with internally relies on the memcmp standard C library call, whereas our strncmp code relies on zend_memnstr. The implementation is different, but e.g. glibc I'd expect memcmp to outperform zend_memnstr.

@lucasnetau
Copy link
Contributor

Oh I misread the code then. I thought strncmp was defined ZEND_FUNCTION(strncmp) which calls zend_binary_strncmp which relies on memcmp too. I only noticed that this function is defined ZEND_FASTCALL zend_binary_strncmp vs zend_always_inline bool zend_string_starts_with_cstr

@nielsdos
Copy link
Member

nielsdos commented May 1, 2025

No actually I got it mixed up ;) Which makes the difference even weirder

@lucasnetau
Copy link
Contributor

If I expand the number of test cases to from 100,000 to 1,000,000 the differences almost disappear, changing the size of the test cases also sees a lot of the differences disappear. strncmp_startswith2 is consistently faster with it's first byte fast fail however.

strncmp_startswith2: 232.9 ms
str_starts_with: 245.2 ms
substr_startswith: 249.0 ms
substr_compare_startswith: 266.1 ms
strncmp_startswith: 280.2 ms
strpos_startswith: 698.5 ms

@nielsdos
Copy link
Member

nielsdos commented May 1, 2025

If I expand the number of test cases to from 100,000 to 1,000,000 the differences almost disappear

Probably makes sense as you're going to measure the overhead of the engine less and you're starting to more measure the actual comparison.

strncmp_startswith2 is consistently faster with it's first byte fast fail however.

Sure, but that check is only worth it if you know likely that the first byte is different (which is true for this benchmark, which is a flaw of the data used), whereas str_starts_with should cater to the general use case :)

Anyway, what these experiments seem to point out is mostly that the benchmark is flawed.

@jdarwood007
Copy link
Author

@nielsdos
I do have xdebug on the system I posted the results from, but I also tested with my MacBook, which doesn't, and those results were similar. I removed Xdebug extension call, confirmed with php -i/php -m and ran the updated benchmark. I am using packages and not compiling.

Ubuntu 24.04 (packages via //ppa.launchpadcontent.net/ondrej/php/ubuntu/) on a "AMD EPYC 7642 48-Core Processor"

generating tests........................................done!
strncmp_startswith2: 34.8 ms
strncmp_startswith: 36.2 ms
substr_compare_startswith: 51.5 ms
substr_startswith: 52.1 ms
str_starts_with: 65.1 ms
strpos_startswith: 140.8 ms

MacOS 12.7 (package via homebrew) on an Intel mac.

generating tests........................................done!
strncmp_startswith2: 38.2 ms
strncmp_startswith: 38.8 ms
substr_startswith: 49.7 ms
substr_compare_startswith: 54.8 ms
str_starts_with: 96.7 ms
strpos_startswith: 132.1 ms

@iluuu1994
Copy link
Member

iluuu1994 commented May 1, 2025

I get similar results as Niels. With the original script:

$ taskset -c 0 nice -15 $(whence php-dev) -d memory_limit=-1 test.php
generating tests........................................done!
strncmp_startswith2: 20.8 ms
str_starts_with: 24.1 ms
substr_compare_startswith: 25.4 ms
strncmp_startswith: 25.6 ms
substr_startswith: 37.5 ms
strpos_startswith: 127.0 ms
preg_match_startswith: 17,207.1 ms

With Niels' adjustments:

$ taskset -c 0 nice -15 $(whence php-dev) -d memory_limit=-1 test.php
generating tests........................................done!
str_starts_with: 15.6 ms
strncmp_startswith2: 18.0 ms
strncmp_startswith: 20.4 ms
substr_compare_startswith: 24.8 ms
substr_startswith: 32.4 ms
strpos_startswith: 120.6 ms

The fact that execution of the first script takes so much longer is suspicious. (I see this is due to preg_match_startswith) It's also worth noting strncmp_startswith2() almost never calls strncmp() because $haystack[0] === $needle[0] almost never applies (only a 1/256 chance). Importantly, there is no universally correct implementation. Which is better depends on the data that is being compared.

@iluuu1994
Copy link
Member

@jdarwood007 Which of the benchmarks are you using here? Can you try the one Niels posted? I don't think this issue is actionable. Unfortunately, both compilers and CPUs are unpredictable, and synthetic benchmarks are very good at amplifying this unpredictability. Making changes that help for all machines, all code bases and all input is likely impossible.

@jdarwood007
Copy link
Author

@iluuu1994 I tried the one niels mentioned and still see the slower results.
I am using a package maintainer version and am not compiling it, which could result in some loss.
Could this be another library related issue that is outdated? I don't know what the underlying C needs if anything that could be showing this.

@iluuu1994
Copy link
Member

iluuu1994 commented May 2, 2025

I am using a package maintainer version and am not compiling it, which could result in some loss.

I tried both with the packaged 8.4.6 version (from nix) and compiling master with similar results.

As mentioned, both strncmp and str_starts_with are implemented with memcmp. Performance can be weird and unpredictable, and change drastically between environments and data sets. I really don't know how we can help here, especially when our results are so drastically different.

@nielsdos
Copy link
Member

nielsdos commented May 2, 2025

Also: try without xdebug 🙂 (i.e. extension unloaded)

@iluuu1994
Copy link
Member

I think they did:

I removed Xdebug extension call, confirmed with php -i/php -m and ran the updated benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants