Skip to content

Test failures with upcoming libxml2 2.14 #18009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nwellnhof opened this issue Mar 9, 2025 · 7 comments
Closed

Test failures with upcoming libxml2 2.14 #18009

nwellnhof opened this issue Mar 9, 2025 · 7 comments

Comments

@nwellnhof
Copy link

Description

I just ran the php-src test suite against the libxml2 master branch from which the 2.14.0 release will be cut soon. This resulted in a few errors in HTML tests. All these errors are expected after making libxml2's HTML parser more HTML5-conformant.

Several non-standard warnings were removed:

================================================================================
/home/nik/src/php-src/ext/dom/tests/DOMDocument_loadHTMLfile_variation1.phpt
================================================================================

================================================================================
001- %r(PHP ){0,1}%rWarning: DOMDocument::loadHTMLFile(): Document is empty %s
001+ 

================================================================================

================================================================================
/home/nik/src/php-src/ext/dom/tests/bug78025.phpt
================================================================================
string(0) ""
================================================================================
001- Warning: DOMDocument::loadHTML(): htmlParseDocTypeDecl : no DOCTYPE name ! in Entity, line: 1 in %s on line %d
     string(0) ""

================================================================================

NUL bytes are handled properly:

================================================================================
/home/nik/src/php-src/ext/dom/tests/bug69679.phpt
================================================================================
<!DOCTYPE html>
<html><head><meta charset="UTF-8"></head><body>U+0000 <span>&#65533;</span></body></html>
================================================================================
     <!DOCTYPE html>
002- <html><head><meta charset="UTF-8"></head><body>U+0000 <span></span></body></html>
002+ <html><head><meta charset="UTF-8"></head><body>U+0000 <span>&#65533;</span></body></html>

================================================================================

================================================================================
/home/nik/src/php-src/ext/dom/tests/bug80268_2.phpt
================================================================================
bool(false)
bool(false)
================================================================================
001- Warning: DOMDocument::loadHTML(): Char 0x0 out of allowed range in Entity, line: 1 in %s on line %d
     bool(false)
003- 
004- Warning: DOMDocument::loadHTMLFile(): Char 0x0 out of allowed range in %s on line %d
     bool(false)

================================================================================

Creation of non-standard "implied" <p> tags was removed:

================================================================================
/home/nik/src/php-src/ext/dom/tests/gh16535.phpt
================================================================================
Hierarchy Request Error
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>oU</body></html>
================================================================================
     Hierarchy Request Error
     <?xml version="1.0" standalone="yes"?>
     <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
004- <html><body><p>oU</p></body></html>
004+ <html><body>oU</body></html>

================================================================================

The tokenizer now conforms to HTML5:

================================================================================
/home/nik/src/php-src/ext/simplexml/tests/bug51615.phpt
================================================================================
object(SimpleXMLElement)#4 (3) {
  ["@attributes"]=>
  array(2) {
    ["title"]=>
    string(0) ""
    ["y""]=>
    string(0) ""
  }
  [0]=>
  string(1) "x"
  [1]=>
  string(1) "x"
}
string(0) ""
string(0) ""
================================================================================
001- Warning: DOMDocument::loadHTML(): error parsing attribute name in Entity, line: 1 in %s on line %d
002- 
003- Warning: DOMDocument::loadHTML(): error parsing attribute name in Entity, line: 1 in %s on line %d
     object(SimpleXMLElement)#4 (3) {
       ["@attributes"]=>
       array(2) {
         ["title"]=>
         string(0) ""
009-     ["y"]=>
006+     ["y""]=>
         string(0) ""
       }
       [0]=>
--

================================================================================

PHP Version

8.5.0-dev

Operating System

No response

@nielsdos
Copy link
Member

nielsdos commented Mar 9, 2025

Thanks for the heads up and for even triaging it!

@nielsdos
Copy link
Member

It looks like GNOME/libxml2@1f5b537 broke two soap tests as well:

  • ext/soap/tests/bugs/bug38536.phpt
  • ext/soap/tests/bugs/bug44882.phpt

I haven't yet figured that out, I will first commit the test fixes.

@nielsdos
Copy link
Member

nielsdos commented Mar 10, 2025

@nwellnhof Just reverting this part of the diff fixes the soap tests: GNOME/libxml2@1f5b537#diff-65f1a082f56ebd52b4eaabdbe54d6e516736c8d9186b8c82c8ca55ad3cac105eL4913-R4934 (you need to scroll down manually to see the highlighted area)
Although OTOH it introduces an empty text node which results in different serialization for this test: ext/soap/tests/bug68996.phpt

I don't know the details (yet) about this libxml code, and I don't have a reduced reproducer yet. Maybe you intuitively see the problem?

@nwellnhof
Copy link
Author

nwellnhof commented Mar 11, 2025

The root cause is that soap_xmlParseFile first sets ctxt->keepBlanks to 0 and php_libxml_sanitize_parse_ctxt_options changes ctxt->keepBlanks to 1 later which confuses libxml2. I haven't tried to figure out from where php_libxml_sanitize_parse_ctxt_options is called, but changing parser options during parsing is a really bad idea.

Before GNOME/libxml2@1f5b537, libxml2 only looked at the keepBlanks flag once when beginning to parse, so this happened to work. I guess I'll have to revert parts of the commit.

@nwellnhof
Copy link
Author

The analysis above is wrong. The "keepBlanks" behavior has always been enabled if the characters and ignorableWhitespace handlers are different, regardless of the option value. The commit above broke that, so this needs to be fixed in libxml2.

@nwellnhof
Copy link
Author

Should be fixed in libxml2 here: https://gitlab.gnome.org/GNOME/libxml2/-/commit/8696ebe182b9867cbded474e1a977664d3e0e144

I also added the PHP tests to the libxml2 CI tests: https://gitlab.gnome.org/GNOME/libxml2/-/commit/5338e43f1531f449f320e7be05ae69834be03d1a

Everything is green: https://gitlab.gnome.org/GNOME/libxml2/-/jobs/4855772

@nielsdos
Copy link
Member

Thanks a lot Nick, works here too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants