Skip to content

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Oct 10, 2025

Problem

The ENCODING_PATTERN regex in XmlReader.java was vulnerable to ReDoS (Regular Expression Denial of Service) attacks due to catastrophic backtracking. The pattern used a greedy quantifier (.*) that could cause exponential time complexity when processing malicious XML input:

Pattern.compile("<\\?xml.*encoding[\\s]*=[\\s]*((?:\".[^\"]*\")|(?:'.[^']*'))", Pattern.MULTILINE);

When the regex engine encounters XML with many characters between <?xml and encoding, it tries all possible combinations of .* matches, leading to polynomial time complexity and potential CPU exhaustion.

Solution

Changed the greedy quantifier to non-greedy (.*?) to prevent catastrophic backtracking:

Pattern.compile("<\\?xml.*?encoding[\\s]*=[\\s]*((?:\".[^\"]*\")|(?:'.[^']*'))", Pattern.MULTILINE);

The non-greedy quantifier matches the minimum number of characters needed, eliminating the exponential backtracking behavior while maintaining identical functional behavior.

Changes

  • XmlReader.java: Changed .* to .*? in ENCODING_PATTERN (line 600)
  • XmlStreamReaderTest.java: Added encodingPatternWithManyAttributes() test to validate the fix handles edge cases with multiple attributes and whitespace variations

Impact

  • Fixes security vulnerability identified in code scanning alert
  • No functional changes - all existing tests pass
  • XmlStreamWriter.java automatically benefits as it references XmlReader.ENCODING_PATTERN
  • Prevents potential DoS attacks via crafted XML input

Testing

  • ✅ All 217 tests pass (216 existing + 1 new)
  • ✅ Code formatting passes spotless check
  • ✅ Full Maven build and verification succeeds
  • ✅ New test validates various edge cases including multiple attributes and whitespace variations

Resolves the code scanning alert for polynomial regular expression used on uncontrolled data.

Original prompt

This section details on the original issue you should resolve

<issue_title>Fix code scanning alert - Polynomial regular expression used on uncontrolled data</issue_title>
<issue_description>

Tracking issue for:

Comments on the Issue (you are @copilot in this section)

@gnodet This one should be moved to [plexus-xml](https://github.com/codehaus-plexus/plexus-xml).

Fixes #50


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Changed the regex pattern from .* to .*? to use non-greedy matching,
preventing catastrophic backtracking on malicious input.

Added test case to validate the fix with various edge cases.

Co-authored-by: slachiewicz <[email protected]>
@Copilot Copilot AI changed the title [WIP] Fix code scanning alert for polynomial regex on uncontrolled data Fix polynomial regular expression vulnerability in XML encoding detection Oct 10, 2025
@Copilot Copilot AI requested a review from slachiewicz October 10, 2025 20:43
Copilot finished work on behalf of slachiewicz October 10, 2025 20:43
@slachiewicz slachiewicz marked this pull request as ready for review October 11, 2025 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix code scanning alert - Polynomial regular expression used on uncontrolled data

2 participants