Skip to content

Feature Request #28790 Add php.ini option to disable stat cache #5894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed

Feature Request #28790 Add php.ini option to disable stat cache #5894

wants to merge 19 commits into from

Conversation

lyda
Copy link

@lyda lyda commented Jul 25, 2020

Provide an option to disable the stat cache.

This helps to remove a class of subtle, frustrating bugs that occur for certain types of applications. It would be better to remove stat caching entirely as the OS already does this in a more reliable way.

The default behaviour is the same as it currently is. You need to add disable_stat_cache = True to the ini file to disable the stat cache. By default this setting is False.

Provide an option to disable the stat cache.
@ghost
Copy link

ghost commented Jul 25, 2020

passed

lyda added 7 commits July 26, 2020 15:36
This runs the same php code without `disable_stat_cache` set and with
it set.
Still need to do the ones for the doc-en repo but not quite sure how
those work.  Suggestions welcome!
Windows path use backslashes as directory separators for some reason.
Hopefully this quoting will help.
Just recreate it in a way the shell won't see it.
@nikic
Copy link
Member

nikic commented Jul 27, 2020

I don't really have a good handle on how useful the stat cache is nowadays and what kind of workloads it is intended to help.

lyda added 3 commits July 27, 2020 10:29
This should reduce the level of escapes required.  Clarifies what is
actually being tested.
There are no boolean "disable_.*" options.  However there are a few
"enable.*" options that default to "On."
@lyda
Copy link
Author

lyda commented Aug 16, 2020

Any further thoughts on this?

Oh, and some notes to myself as I close out: how to run the test suite and original bug link.

@m50
Copy link

m50 commented Aug 18, 2020

I don't really have a good handle on how useful the stat cache is nowadays and what kind of workloads it is intended to help.

@nikic

The Stat Cache was, as far as I am aware, implemented to help with having to constantly query the filesystem for stats on files (such as perms, file size, etc), thus making it more efficient to query the same file multiple times in a single request. For example, if you check if it exists, then open it, then get it's size, then read it, the getting of it's size is already done when you checked if it exists (as far as I know). The problem is, if a file is modified (particularly by an outside program, or an exec call) between checking it's stats and opening it, this can cause the stats to be wrong, and cause errors.

As @lyda mentioned in the original bug link, modern OSes will actually cache this information and keep it up to date a lot better than PHP does, and so the performance impact is minimal to have it off, and (at least in our case, where we have to clear the stat cache regularly), it's actually an improvement over using PHP's build in caching.

Ideally having stat caching built into PHP would be removed, but being able to disable it with an INI flag would work as well.

I will mention, I am not the one working on the issues the stat cache caused in our company, just relaying what I've heard from my team.

@cmb69
Copy link
Member

cmb69 commented Aug 18, 2020

Maybe this topic should be discussed on the PHP internals mailing list?

@Girgias Girgias changed the title Fix bug #28790 Feature Request #28790 Add php.ini option to disable stat cache Nov 3, 2020
@lyda
Copy link
Author

lyda commented Sep 2, 2021

If this ever gets to a point of being accepted, this search should be a starting point for some other bugs to close due to this issue.

I've emailed the suggested list to see what needs to be done to get this PR accepted. I'm aware the merge is broken. A note to myself on how to fix the merge:

Run git grep 'sizeof('php_stream_statbuf in both master and disable_stat_cache - code has likely moved to ext/standard/filestat.c from main/streams/streams.c.

@lyda
Copy link
Author

lyda commented Sep 2, 2021

Oh, and the discussion is in this thread.

.phpt files shouldn't be indented.  Fixed bug28790.cache.phpt which
tests the functionality for this set of commits.
lyda and others added 2 commits September 3, 2021 14:15
@lyda
Copy link
Author

lyda commented Oct 20, 2021

This is still tagged with "waiting on author" and I'm unclear on what else I can do to move this over the line.

@cmb69
Copy link
Member

cmb69 commented Oct 20, 2021

Apparently, the discussion gathered no consensus. Not sure how to proceed; an RFC would certainly the best option, but would delay the feature (unless we consider it too late for PHP 8.1 anyway, what we presumably do).

@lyda
Copy link
Author

lyda commented Jan 18, 2022

Note to myself: this is the RFC process.

@iluuu1994
Copy link
Member

@lyda Any progress? 🙂

@lyda lyda requested review from iluuu1994 and bukka as code owners October 7, 2023 13:51
@billynoah
Copy link

For the love of all things, please approve this PR.

@lyda
Copy link
Author

lyda commented Dec 16, 2024

Sorry, someone else will have to pick this up. All of my development work is in Go and C these days. Little bit of python.

I find the response to this PR by the maintainers to be less than stellar. I jumped though the various hoops for over a year and there has been no progress. The current behaviour in PHP is unexpected, error prone and a broken implementation that the operating system already provides. It should removed entirely.

I'll leave this open, but someone else is going to have to carry it across the line. Best of luck.

BTW, really enjoying Go. We've moved from a PHP based identity service (800mb container, 500mb memory, needed 100+ instances to scale) to a Go based one (50mb container, 200mb memory, needed 12-18 instances to scale).

@cmb69
Copy link
Member

cmb69 commented Dec 16, 2024

[…] and a broken implementation that the operating system already provides.

Interesting! 3 years ago you proved the opposite.

@lyda
Copy link
Author

lyda commented Dec 16, 2024

Wow. You know, with that response it's clear that it is pointless keeping this PR open. No matter who works on it, it will not be accepted. Well done, you win.

@lyda lyda closed this Dec 16, 2024
@lyda lyda deleted the disable_stat_cache branch December 16, 2024 16:02
@bukka
Copy link
Member

bukka commented Dec 16, 2024

This was just waiting for RFC as there was no clear consensus. If the RFC was done, it would get accepted and merged ofc. So technically that feature was basically blocked by the author of this PR because it was kept open and no RFC done. :)

@bukka
Copy link
Member

bukka commented Dec 16, 2024

I'm joking, ofc, as anyone could have picked it up if there was enough interest and no feedback for so long. It just got forgotten...

@bukka
Copy link
Member

bukka commented Dec 16, 2024

What I wanted to say is that the RFC is the main thing for acceptance in this case.

@lyda
Copy link
Author

lyda commented Dec 16, 2024

What I wanted to say is that the RFC is the main thing for acceptance in this case.

Then make the RFC.

I'm pretty sure I know what's going to happen with it. The same thing that happened to this PR and the same thing that happened to the discussion. Spending time writing an RFC that will go nowhere just isn't high on my list of priorities.

This is implemented in like what, 10 lines of code? The rest is just whitespace changes, tests and comments in the config file. It's off by default so the current behaviour is unchanged. Requesting an RFC for that sort of change is essentially a polite way of saying "no, we don't want this."

Which is fine. But don't then say it wasn't merged because of me; that's just passive-aggressive gaslighting.

If you think I'm incorrect, you are free to prove it. As I said it's super short, you don't even have to deal with rebasing it. Just copy the changes from the diff. Make the RFC. See how you get on. Best of luck!

@billynoah
Copy link

Sorry this went sideways @lyda - I copied your work as you suggested above. #17178

@lyda
Copy link
Author

lyda commented Dec 16, 2024

Sorry this went sideways @lyda - I copied your work as you suggested above. #17178

Best of luck. The RFC gets discussed on the same mailing list that failed to get a consensus in discussion so... should be entertaining! But hey, if you get it merged and are ever in Galway, happy to buy you a pint.

And if you do get it in, this link yields a number of results and a chunk of them would find this option to be a solution. Not sure how you update/close the bugs, but on the off chance any of those reporters haven't retired yet, you'll make them happy.

@bukka
Copy link
Member

bukka commented Dec 16, 2024

So for the record I support this feature so if there was a passing RFC, I would merge it myself. There are few open related issues which I would like to get resolved:

I really don't see why it couldn't be disabled and also questioned its usability on Linux nowadays.

However if there is no clear consensus, I can't just merge this. If there's no consensus, then the only option is RFC. I'm busy to write the RFC myself so I would prefer if someone else does it but I would be happy to handle the merging once / if it gets accepted.

Also the fact that some people disagree does not mean that the feature gets rejected. There are often many people that agree but are just quiet.

@billynoah
Copy link

@bukka - I understand you are busy. It's just that it's something I have no experience with so would take research, time, and nice words. Maybe for someone who's been through the process before it will be quicker and easier to get done. If nothing materializes over the days to come, I will start with "The Mysterious PHP RFC Process and How You Can Change the Web" and see where it leads...

@lyda
Copy link
Author

lyda commented Dec 16, 2024

There was a discussion about this PR on the internals mailing list. It came to no consensus.

The discussion about the RFC about this PR would happen on the internals mailing list. It will come to __ consensus.

If I was 5 years and 3 months old, I might fall for that. Unfortunately I'm 53 so I don't fall for the ol' "more process yields better results" shtick. Pretty much just falling for ankles, knees and back reasons these days. Now those are tricksy.

But @billynoah seems young and hopeful, so maybe he'll have better luck. Ethically I should say it might be better if he just goes and learns Haskell or Go. Or whatever the kids today use - Zig? Rust? Pygyat?

@bukka
Copy link
Member

bukka commented Dec 16, 2024

@lyda If it's RFC it doesn't need to have consensus. You just need to announce it, then let the discussion last at least two weeks and then put it to vote no matter if there's consensus or not. The vote needs to last two weeks and it will decide about the acceptance.

@billynoah Yeah I can help with that. I don't mind to handle it if you write the text for it. I have got this repo https://github.com/bukka/php-rfc for my RFC's so if you create a PR with the text for this one, we can iron it out and then I can propose it. It mainly need some proposal, description and maybe some other section text. There are bunch of RFC's in https://wiki.php.net/rfc that you can use as an example.

@billynoah
Copy link

@lyda - thanks! That's really helpful. I will fit this in sometime soon and get back to you.

@lyda
Copy link
Author

lyda commented Dec 16, 2024

@billynoah: This is a good one to use as a template since it's proposing a similar thing - an optional ini setting that doesn't change the default behaviour but, if set, will cause php to behave differently.

There's a code example showing the issue - that bug list might be helpful with that. Several examples there. You can show some workarounds. Again, the bugs might have examples there. Make sure to note in the first section that stat-ing the same file two million times in a row might be slower without the stat cache.

Added bonus, that RFC is in the discussion phase and it was submitted in December so you can see how the voting process works.

@cmb69
Copy link
Member

cmb69 commented Dec 16, 2024

I had a closer look at https://bugs.php.net/72666, and it seems to me that no change relevant to file stats (regardless of whether been done by a PHP internal call or external change) ever updates/resets the state cache, as long as the identical file name is used.

Since there is no path normalization, it's trivial to work around that, though, e.g.

file_put_contents(__DIR__ . '/foo', "hello");
var_dump(filesize(__DIR__ . '/foo'));
file_put_contents(__DIR__ . '/foo', "hello world");
var_dump(filesize(__DIR__ . '/./foo'));

outputs

    int(5)
    int(11)

Given that file_exists(), is_readable(), is_writeable() and is_executable() don't use the stat cache at all, the only benefit it brings is to avoid multiple stat() calls in case code triggers multiple calls to other functions which call stat() under the hood (e.g. filemtime(), filesize()), instead of calling stat() directly. That makes the stat cache almost useless. Instead of trying to fix it, or to introduce an INI option to disable it, we should consider to deprecate it.

incomplete patch
 ext/standard/filestat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ext/standard/filestat.c b/ext/standard/filestat.c
index 0825fa0586..60d78c696e 100644
--- a/ext/standard/filestat.c
+++ b/ext/standard/filestat.c
@@ -803,6 +803,7 @@ PHPAPI void php_stat(zend_string *filename, int type, zval *return_value)
 			if (filename == BG(CurrentStatFile)
 			 || (BG(CurrentStatFile)
 			  && zend_string_equal_content(filename, BG(CurrentStatFile)))) {
+				php_error_docref(NULL, E_DEPRECATED, "The stat cache is deprecated; use stat() instead of multiple calls to stat functions");
 				stat_sb = &BG(ssb).sb;
 				break;
 			}

@billynoah
Copy link

@cmb69 - I agree but I believe that suggestion has been shot down in the past which is why add an INI option to begin with. If we could get rid of stat cache without breaking anything and with no detrimental affect on existing app performance, I would be all for it.

I also believe that this behaviour might have changed at some point since the documentation on file_exists() for example clearly states:

Note: The results of this function are cached. See clearstatcache() for more details.

And the same function is listed under the docs for clearstatcache(). Maybe it would also be good to update the php.net docs to not say wrong things.

In any event, my use case requires that both filesize() and filemtime() return the correct value, which it constantly does not. This causes me to continually call clearstatcache() in my code so that php can clear and then rebuild a cache which is useless to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.