-
-
Notifications
You must be signed in to change notification settings - Fork 32k
HTMLParser handle_starttag replaces entity references in attribute value even without semicolon #69426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In the document of HTMLParser.handle_starttag, it states "All entity references from html.entities are replaced in the attribute values." However it will replace the string if it matches ampersand followed by the entity name without the semicolon. For example <a href="https://github.com/go?t=buy¤cy=usd">foo</a> will produce "t=buy¤cy=usd" as the value of href attribute due to "curren" is the entity name for the currency sign. |
This seems indeed to be a bug. The relevant bit is at http://www.w3.org/TR/html5/syntax.html#consume-a-character-reference :
Off the top of my head, this paragraph is not implemented in HTMLParser (and it should). |
…in attribute values
…ities in attribute values (GH-95215) According to the HTML5 spec, named character references in attribute values should only be processed if they are not followed by an ASCII alphanumeric, or an equals sign. https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state
…er entities in attribute values (pythonGH-95215) According to the HTML5 spec, named character references in attribute values should only be processed if they are not followed by an ASCII alphanumeric, or an equals sign. (cherry picked from commit 77b14a6) Co-authored-by: Sascha Ißbrücker <[email protected]> https: //html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state
…er entities in attribute values (pythonGH-95215) According to the HTML5 spec, named character references in attribute values should only be processed if they are not followed by an ASCII alphanumeric, or an equals sign. (cherry picked from commit 77b14a6) Co-authored-by: Sascha Ißbrücker <[email protected]> https: //html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state
…ter entities in attribute values (GH-95215) (GH-133704) According to the HTML5 spec, named character references in attribute values should only be processed if they are not followed by an ASCII alphanumeric, or an equals sign. (cherry picked from commit 77b14a6) https: //html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state Co-authored-by: Sascha Ißbrücker <[email protected]>
…ter entities in attribute values (GH-95215) (GH-133586) According to the HTML5 spec, named character references in attribute values should only be processed if they are not followed by an ASCII alphanumeric, or an equals sign. (cherry picked from commit 77b14a6) https: //html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state Co-authored-by: Sascha Ißbrücker <[email protected]>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: