Fix polynomial regular expression vulnerability in XML encoding detection #68
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The
ENCODING_PATTERN
regex inXmlReader.java
was vulnerable to ReDoS (Regular Expression Denial of Service) attacks due to catastrophic backtracking. The pattern used a greedy quantifier (.*
) that could cause exponential time complexity when processing malicious XML input:When the regex engine encounters XML with many characters between
<?xml
andencoding
, it tries all possible combinations of.*
matches, leading to polynomial time complexity and potential CPU exhaustion.Solution
Changed the greedy quantifier to non-greedy (
.*?
) to prevent catastrophic backtracking:The non-greedy quantifier matches the minimum number of characters needed, eliminating the exponential backtracking behavior while maintaining identical functional behavior.
Changes
.*
to.*?
inENCODING_PATTERN
(line 600)encodingPatternWithManyAttributes()
test to validate the fix handles edge cases with multiple attributes and whitespace variationsImpact
XmlStreamWriter.java
automatically benefits as it referencesXmlReader.ENCODING_PATTERN
Testing
Resolves the code scanning alert for polynomial regular expression used on uncontrolled data.
Original prompt
Fixes #50
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.