Skip to content

php8ts crashes in php_clear_stat_cache() #18145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jjdunn opened this issue Mar 25, 2025 · 29 comments
Closed

php8ts crashes in php_clear_stat_cache() #18145

jjdunn opened this issue Mar 25, 2025 · 29 comments

Comments

@jjdunn
Copy link

jjdunn commented Mar 25, 2025

Description

debug-analysis-1.txt

very sorry: I don't know how to reproduce this bug. but I can see the crash dump files on the web server.

using WinDbg I opened and analyzed one of the files. the analysis including stack-dump is in the attached txt file.

this is PHP 8.3.19 thread-safe running in Apache 2.4.63 on Windows Server 2019

PHP Version

PHP 8.3.19

Operating System

Windows Server 2019

@devnexen
Copy link
Member

cc @bukka, might have some idea whether it is related to last change in this area.

@bukka
Copy link
Member

bukka commented Mar 26, 2025

I have been looking into this and it seems like the new cache clearance is causing this. But after going couple of time through the code paths, I can't understand why this is crashing. The stack text is not that helpful as it's not proper stack trace so I can't really see the full path from it. But it looks like it's happening during sapi activate but the strange thing that all the new stream clearances happen only if EG(active) is true. But from what I see if EG(active) is true, then BG should always be initialized. It seems to me more like this is uncovering some other issue or I'm missing something. Maybe @nielsdos might have an idea as he was fixing things in TSRM in past.

The solution would be revert that stream part but I would like to first try to figure out what's going on.

@jjdunn Can you see some other issues in the logs (e.g. some memory limit or other fatal error around the same time)? Could you you also share your httpd.conf and tell us how often do you see those crashes compare to the other traffic?

Btw. there are some known issue with mod_event (assuming that's what you use) so it might be also related to this but not sure.

@cmb69
Copy link
Member

cmb69 commented Mar 26, 2025

@jjdunn, please also install the suitable debug pack (likely https://windows.php.net/downloads/releases/php-debug-pack-8.3.19-Win32-vs16-x64.zip), so we get a better stack backtrace.

@jjdunn
Copy link
Author

jjdunn commented Mar 26, 2025

Hi @bukka and @cmb69 - Thank you very much for investigating this problem !

there are some known issue with mod_event (assuming that's what you use)

we are not using mod_event

Could you you also share your httpd.conf

attached please find sanitized versions of httpd.conf and httpd-vhosts.conf, with only the host which is crashing included (suffix .txt added so this ticket system will allow upload)

httpd.conf.txt
httpd-vhosts.conf.txt

Can you see some other issues in the logs (e.g. some memory limit or other fatal error around the same time)?

there is nothing unusual in the apache Access logs
the apache Error log contains three lines for each crash:

[Wed Mar 26 07:27:00.943981 2025] [mpm_winnt:notice] [pid 2764:tid 144] AH00428: Parent: child process 3860 exited with status 3221225477 -- Restarting.
[Wed Mar 26 07:27:01.186187 2025] [mpm_winnt:notice] [pid 2764:tid 144] AH00455: Apache/2.4.63 (Win64) OpenSSL/3.3.2 mod_log_rotate/1.02 configured -- resuming normal operations
[Wed Mar 26 07:27:01.186187 2025] [mpm_winnt:notice] [pid 2764:tid 144] AH00456: Apache Lounge VS17 Server built: Feb 3 2025 10:10:51

there is nothing relevant in the logs generated by the PHP application

we have a page-based PHP / HTML / CSS app built using the Yii 1.1 framework; but there's an embedded single-page javascript app (no framework). The js app does server callbacks to get/save data. We just installed a new version of the js app (yesterday on the Test server, today on the Prod server). When the js app opens it fetches the data, and immediately tries to save. One of these operations is failing. It's unclear if there's a bug in the js app - it works fine on the lead developer's local environment; and works fine in my development environment; and now (weirdly) works fine in Test. (I reverted Prod to the prior version). We don't yet know if there's an application-code error; but in any case I would not expect PHP to crash even if there were.

@cmb69 wrote:

please also install the suitable debug pack.....

I can do this in my local dev environment, and on the Test server. but since we cannot reliably reproduce the problem, we may have to wait a while to see what happens...

I got the mini-dumps for the crashes on prod; but since you've explained there is insufficient information I won't send the WinDbg analysis unless you indicate this would be helpful. BTW I am not a systems-developer, just an application-land developer; so my ability to use WinDbg and similar tools is a bit limited. I can follow any instructions if you send them.

@jjdunn
Copy link
Author

jjdunn commented Mar 26, 2025

I downloaded the debug pack locally, unpacked, and added the unpacked-path to the Symbol Path in WinDbg. Is this right?

attached is the result of "analysis -v" in WinDbg, using the debug pack

does that provide additional information? If not please guide me how to use the debug pack - I'm sorry but I've never done it before...

2025-03-26-debug1.txt

@cmb69
Copy link
Member

cmb69 commented Mar 26, 2025

I downloaded the debug pack locally, unpacked, and added the unpacked-path to the Symbol Path in WinDbg. Is this right?

Yes, that should be correct. And at least it's better now:

CONTEXT:  (.ecxr)
rax=0000022a29028030 rbx=0000022a12c94d30 rcx=ffffffff0000000d
rdx=0000022a2f1e2da0 rsi=0000000000000000 rdi=0000000000000000
rip=00007ffa60172e8a rsp=000000c4738bab40 rbp=0000000000000000
 r8=0000022a33cd1590  r9=0000000000000009 r10=0000000000000000
r11=000000c4738baaf0 r12=0000000000000020 r13=0000000020800278
r14=000000c4738bad50 r15=0000000000004000
iopl=0         nv up ei ng nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010282
php8ts!zend_string_release [inlined in php8ts!php_clear_stat_cache+0x5a]:
00007ffa`60172e8a 8b4104          mov     eax,dword ptr [rcx+4] ds:ffffffff`00000011=????????
Resetting default scope

So it crashes during a zend_string_relase() call.

However, the stack backtrace still doesn't contain line numbers. Not sure why (I rarely use release builds with debug symbols when debugging).

Let's see; maybe Jakub or Niels have an idea what might be wrong there.

@jjdunn
Copy link
Author

jjdunn commented Mar 26, 2025

looking at the logs on our Test server from yesterday, I see this in the Opcache log:

Tue Mar 25 09:43:14 2025 (8984): Warning Internal error: wrong size calculation: C:\web\test\app\models\Enrollments.php start=0x000001fb8ba3d040, end=0x000001fb8bb044f0, real=0x000001fb8bae9250

Tue Mar 25 09:43:20 2025 (1656): Warning Internal error: wrong size calculation: C:\web\test\app\controllers\SeatingChartController.php start=0x000001fb8d91fd80, end=0x000001fb8d9331b0, real=0x000001fb8d930660

Tue Mar 25 09:43:25 2025 (9300): Warning Internal error: wrong size calculation: C:\web\test\app\views\seatingChart\seatingChart.php start=0x000001fb8e6efa40, end=0x000001fb8e70fc28, real=0x000001fb8e709cb8

and here's one of the application crashes, just seconds later:

[Tue Mar 25 09:43:56.587748 2025] [mpm_winnt:notice] [pid 9384:tid 548] AH00428: Parent: child process 2984 exited with status 3221225477 -- Restarting.

the three files noted (models/Enrollments, SeatingChartController, /views/seatingChart/seatingChart) - are the files used by the single-page js app. the seatingChart php file launches the js code, which calls back to SeatingChartControllerto fetch/save the data.

unclear if this is relevant but sharing so you have all possible info

@cmb69
Copy link
Member

cmb69 commented Mar 26, 2025

Tue Mar 25 09:43:20 2025 (1656): Warning Internal error: wrong size calculation:

That looks like an OPcache corruption.

@bukka
Copy link
Member

bukka commented Mar 26, 2025

Thanks for the new stack trace, it makes more sense now and that path is now quite clear. I was just checking again this flow and specifically this path and it still doesn't make any sense to me. It seems that it's crashing during release of either BG(CurrentStatFile) or BG(CurrentLStatFile) but from all usage of those two globals, they always get set to NULL after freeing and they should be also NULL after initialization which happens during standard ext MINIT (in php_win32_core_globals_ctor way before sapi_activate is called) and also durint RINIT. The only other setting to those happens during php_stat where the filename is copied to the to them. It cannot also come from the previous request as it always gets freed during MSHUTDOWN.

The only other thing that comes to my mind is that there is some other heap corruption going on.

@bukka
Copy link
Member

bukka commented Mar 26, 2025

Just to add a bit more info.

It happens during following calls

php_request_startup -> sapi_activate -> sapi_read_post_data -> sapi_read_standard_form_data -> _php_stream_write -> php_stdiop_write (plain wrapper) -> php_clear_stat_cache

The sapi_activate is called after zend_activate (hence EG(active) is true) but before zend_module_activate (RINIT) so it relies on data being NULL after MINIT. From what I see, the TSRM tables get allocated during ts_allocate_id which is called in PHP_MINIT_FUNCTION(basic) but the actual ctor that sets everything to NULL is in ts_resource called from php_module_startup that is called by php_apache2_startup which is a SAPI startup function. So it should not happen after the first call. And for subsequent calls, it is freed and set to NULL during RSHUTDOWN. So I'm not sure how this can happen.

@nielsdos
Copy link
Member

Didn't deep dive into the logs yet, but since you mentioned PHP_MINIT_FUNCTION(basic)... Why does basic_globals_ctor use BG instead of the basic_globals_p variable? Then it can in some cases initialize the data from the wrong thread.

@nielsdos
Copy link
Member

but the actual ctor that sets everything to NULL is in ts_resource called from php_module_startup that is called by php_apache2_startup which is a SAPI startup function.

Where do you see this? I didn't quite understand this sentence. I only see the code from RINIT that sets it to NULL; but I don't see a GINIT that sets it to NULL.

@nielsdos
Copy link
Member

If you add a fprintf(stderr,"HIER %p\n", BG(CurrentStatFile)); in sapi_activate you get this:

=================================================================
==1952==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x122fae7a0148 at pc 0x7ffde70abf22 bp 0x008662bfed10 sp 0x008662bfed10
READ of size 8 at 0x122fae7a0148 thread T0
    #0 0x7ffde70abf21 in sapi_activate C:\php-sdk\phpdev\vs17\x64\php-src\main\SAPI.c:436
    #1 0x7ffde70ce34e in php_module_startup C:\php-sdk\phpdev\vs17\x64\php-src\main\main.c:2100
    #2 0x7ff69d6c3935 in php_cli_startup C:\php-sdk\phpdev\vs17\x64\php-src\sapi\cli\php_cli.c:410
    #3 0x7ff69d6c86af in main C:\php-sdk\phpdev\vs17\x64\php-src\sapi\cli\php_cli.c:1308
    #4 0x7ff69d6f0798 in invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #5 0x7ff69d6f06e1 in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #6 0x7ff69d6f059d in __scrt_common_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:330
    #7 0x7ff69d6f080d in mainCRTStartup D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:16
    #8 0x7ffe43837373  (C:\Windows\System32\KERNEL32.DLL+0x180017373)
    #9 0x7ffe441fcc90  (C:\Windows\SYSTEM32\ntdll.dll+0x18004cc90)

0x122fae7a0148 is located 8 bytes to the left of 16-byte region [0x122fae7a0150,0x122fae7a0160)
allocated by thread T0 here:
    #0 0x7ffdeb47788a in _realloc_dbg D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\asan\asan_malloc_win.cpp:318
    #1 0x7ffde71b270e in tsrm_update_active_threads C:\php-sdk\phpdev\vs17\x64\php-src\TSRM\TSRM.c:257
    #2 0x7ffde71b0051 in ts_allocate_id C:\php-sdk\phpdev\vs17\x64\php-src\TSRM\TSRM.c:306
    #3 0x7ffde71b4892 in tsrm_win32_startup C:\php-sdk\phpdev\vs17\x64\php-src\TSRM\tsrm_win32.c:92
    #4 0x7ffde70abd27 in sapi_startup C:\php-sdk\phpdev\vs17\x64\php-src\main\SAPI.c:85
    #5 0x7ff69d6c857a in main C:\php-sdk\phpdev\vs17\x64\php-src\sapi\cli\php_cli.c:1294
    #6 0x7ff69d6f0798 in invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #7 0x7ff69d6f06e1 in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #8 0x7ff69d6f059d in __scrt_common_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:330
    #9 0x7ff69d6f080d in mainCRTStartup D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:16
    #10 0x7ffe43837373  (C:\Windows\System32\KERNEL32.DLL+0x180017373)
    #11 0x7ffe441fcc90  (C:\Windows\SYSTEM32\ntdll.dll+0x18004cc90)

SO yeah, a path called from sapi_activate will access invalid memory most likely then.

@bukka
Copy link
Member

bukka commented Mar 26, 2025

Didn't deep dive into the logs yet, but since you mentioned PHP_MINIT_FUNCTION(basic)... Why does basic_globals_ctor use BG instead of the basic_globals_p variable?

Yeah that doesn't look right and it actually omits cleaning the stat cache globals as well.

Where do you see this? I didn't quite understand this sentence. I only see the code from RINIT that sets it to NULL; but I don't see a GINIT that sets it to NULL.

Ah I misread and thought that php_win32_core_globals_ctor is used but that's for php_win32_core_globals_id ...

Yeah so I think we just need to properly ctor the basic global. Will create a PR for that.

@nielsdos
Copy link
Member

nielsdos commented Mar 26, 2025

About my ASAN dump: I think the realloc code does not correctly update the fast offsets, I'll look into it.
Cause found: see below

@nielsdos
Copy link
Member

As usual, this issue is likely a combination of a few things

@nielsdos
Copy link
Member

Ah, basic_globals_id is 0 in sapi_activate, which is an invalid TSRM resource id. So that's why it creates an out of bounds access when trying to access fields from BG.

@nielsdos
Copy link
Member

nielsdos commented Mar 26, 2025

@bukka The root cause of this issue is simply that the BG globals are not initialized at the time sapi_activate is called, they are only initialized when the modules start up (i.e. the entirety of sapi_activate executes before zend_activate_modules). This causes the basic_globals_id to be 0 in ZTS and therefore cause an access to a garbage offset in memory in the TSRM storage. This then just reads some random pointer and causes memory corruption.

@bukka
Copy link
Member

bukka commented Mar 26, 2025

Hmm I'm not sure actually. The zend_activate_modules just calls request_startup_func (RINIT) but they are initialized in MINIT which is called in zend_startup_module_ex which is called from php_module_startup. What you see is actually the sapi_activate call happening in php_module_startup but at this stage EG(active) should be false so the clearing should not be called. You can try by adding that condition if (EG(active)) { printf ... }.

However it will be called later as well during the request startup. But as mentioned before, the basic_globals_ctor does not initialize stat globals which is what I think is causing the crash.

bukka added a commit to bukka/php-src that referenced this issue Mar 26, 2025
@bukka bukka linked a pull request Mar 26, 2025 that will close this issue
@nielsdos
Copy link
Member

Okay I see, so then it's a bit simpler

@bukka bukka closed this as completed in 2197a49 Mar 27, 2025
@jjdunn
Copy link
Author

jjdunn commented Mar 27, 2025

@bukka and @nielsdos thank you very much for your rapid response to this bug report! I am very impressed 👍

I am quite ignorant of the PHP release-cycle process; and cannot find any information online about expected timeframe for 8.3.20, which I imagine would include this fix. can you set any expectation, or tell me where to find this info?

also, just out of curiosity, was this problem caused by the fix for https://bugs.php.net/bug.php?id=72666, which was included in 8.3.19 ?

@cmb69
Copy link
Member

cmb69 commented Mar 27, 2025

Since PHP 8.3.20RC1 will be released today, it's unlikely that the fix will be included in 8.3.20. 8.3.21RC1 is to be expected in 4 weeks, 8.3.21 two weeks later.

@bukka
Copy link
Member

bukka commented Mar 27, 2025

Sorry put it incorrectly to 8.3.20 NEWS but it will be in 8.3.21 as we have already RC1 for 8.3.20. So ti will get released on 8th May.

Yes that fix from 8.3.19 caused the issue to happen even though the implementation was already incorrect - it just didn't result in any visible issue.

@bukka
Copy link
Member

bukka commented Mar 27, 2025

For the record the NEWS fix is following: fe49306

@jjdunn
Copy link
Author

jjdunn commented Mar 27, 2025

Thanks all for the updates.

I'm trying (without success) to isolate which part of our application code triggers this bug. It's quite inconsistent: the same action works multiple times, then fails. When it fails a) there is nothing in the apache access log; b) it never invokes the step-debugger; c) debug/trace logging in the application code never get output. Very difficult to figure out where it's failing.

I reverted to PHP 8.3.18 on my local dev system; and the bug is still there.
I also reverted to the prior version of our application code (before recent changes), and the bug is still there.
It's very puzzling to me why this issue is not happening on our production environment, which is also on 8.3.19

I don't have the skills/tools to build PHP myself.
Is there any way to get a windows build ( x64 TS) of PHP with this fix included, so I can test ??

@jjdunn
Copy link
Author

jjdunn commented Mar 27, 2025

p.s. I analyzed another recent dump file, and the stack-trace continues a little past php_clear_stat_cache(). There are calls to efree() and zend_mm_is_custom_heap(). Is this expected, or surprising?

EDIT: this is on 8.3.18, not 8.3.19

STACK_TEXT:  
000000c1`a283adc0 00007ffb`7577e52f     : 00007ffb`00000003 00007ffb`00000003 00007ffb`75a013c0 00000000`00000000 : ucrtbase!abort+0x4e
000000c1`a283adf0 00007ffb`752b64ab     : 00000000`00004000 00000000`00000000 00000000`00000000 00000280`1002de70 : php8ts!zend_mm_is_custom_heap+0x16f
000000c1`a283ae20 00007ffb`754b2f35     : 00000280`0ea4cce0 00000000`00000000 00000000`00000000 00000000`00000000 : php8ts!efree+0xcb
000000c1`a283ae50 00007ffb`758720ee     : 00000280`12c6c1c0 00000000`00004000 00000000`00004000 00007ffb`752b2366 : php8ts!php_clear_stat_cache+0xd5
000000c1`a283ae80 00007ffb`752eaa9e     : 00000280`12c6c1c0 00000000`00000002 00000000`00004000 00000000`00000001 : php8ts!php_stream_fopen_with_path+0xf1e
000000c1`a283aec0 00007ffb`752eaa1b     : 00000280`12c6c1c0 00000280`12c6c000 00000000`00004000 00000000`00004000 : php8ts!php_stream_write+0xae
000000c1`a283af20 00007ffb`75870941     : 00000280`12c6a000 00000280`12c6c000 00000000`00004000 00000280`12c04038 : php8ts!php_stream_write+0x2b
000000c1`a283af50 00007ffb`752eaa9e     : 00000280`12c6c000 00000000`00000000 00000000`00004000 00000000`00000000 : php8ts!php_stream_mode_from_str+0x411
000000c1`a283af90 00007ffb`752eaa1b     : 00000280`12c6c000 00000000`00004000 00000000`00000140 00000000`00000140 : php8ts!php_stream_write+0xae
000000c1`a283aff0 00007ffb`75859516     : 00000280`1002de70 00000000`00004000 00000000`00000140 00000000`00000000 : php8ts!php_stream_write+0x2b
000000c1`a283b020 00007ffb`758593a3     : 00000280`124bf110 00000280`12c04000 00000000`00000000 00000280`12c04022 : php8ts!sapi_read_standard_form_data+0xf6
000000c1`a283f080 00007ffb`752d0ec5     : 00000280`124bf110 00000000`00000020 00000280`1002de70 000000c1`a283f380 : php8ts!sapi_initialize_empty_request+0x183
000000c1`a283f0d0 00007ffb`752cfb66     : 00000280`1246fef8 000000c1`a283f380 00000280`1002de70 00000280`1246ed50 : php8ts!sapi_activate+0x265
000000c1`a283f110 00007ffb`ab953b2e     : 00000280`1246fef8 000000c1`a283f380 00000000`00000022 00000280`0eac4f68 : php8ts!php_request_startup+0x136
000000c1`a283f280 00007ffb`8910f395     : 00000000`00000002 00000000`00000002 00000280`1246fef8 00000000`00000000 : php8apache2_4+0x3b2e

@jjdunn
Copy link
Author

jjdunn commented Mar 27, 2025

Is there any way to get a windows build ( x64 TS) of PHP with this fix included, so I can test ??

I found this: https://github.com/shivammathur/php-builder-windows/releases/tag/php8.3 which appears to have builds of 8.3.21dev from within the past 14 hours, since the commit which fixed this bug.

I installed https://github.com/shivammathur/php-builder-windows/releases/download/php8.3/php-8.3.21-dev-Win32-vs16-x64.zip

however that version still crashes on my system. see attached debug info

debug_8.3.21_dev.txt

EDIT: sorry for the multiple missing-symbol-path problems in the debug analysis. Scroll to the bottom to see the one with correct symbols.

@cmb69
Copy link
Member

cmb69 commented Mar 27, 2025

I found this: https://github.com/shivammathur/php-builder-windows/releases/tag/php8.3 which appears to have builds of 8.3.21dev from within the past 14 hours, since the commit which fixed this bug.

According to https://github.com/shivammathur/php-builder-windows/releases/download/php8.3/ts-windows-vs16-x64.json, the build does not yet contain the fix. I suggest to try with the next nighly builds.

@jjdunn
Copy link
Author

jjdunn commented Mar 28, 2025

I suggest to try with the next nighly builds.

Done, and it seems to be fixed.

thanks again everyone for your prompt response to this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants