Summary
decodeHtmlEntities in actions/setup/js/sanitize_content_core.cjs (v0.68.3) handles decimal () and hex () numeric entities for invisible/formatting characters, but does not handle their named entity forms (­, ‍, ‌, ‎, ‏). Because hardenUnicodeText Step 3 operates on actual Unicode code points — not on &name; string literals — the named entity forms survive intact. neutralizeAllMentions then fails to match @­victim because the character after @ is & (not [A-Za-z0-9]), so the mention passes through unsanitized. When GitHub renders the output, the entity decodes to an invisible character and the result appears as @victim to readers. This is a partial bypass of the fix applied in gh-aw#24154 (originating from the #1611 finding).
Affected Area
Safe-outputs output sanitization boundary — sanitizeContent / sanitizeContentCore → decodeHtmlEntities (line ~975) → hardenUnicodeText Step 3 stripping regex. Affects all safe-output types where body/text fields have "sanitize": true in validation.json.
Reproduction Outline
- Obtain
sanitize_content_core.cjs at v0.68.3 (SHA 159c2fed045bdd850374b084fe92182c9e31b147237944f41aecd765d068e685).
- Run:
node -e "const {sanitizeContentCore} = require('./sanitize_content_core.cjs'); console.log(sanitizeContentCore('@\u00ADvictim say hi'));"
→ Output includes `@victim` — neutralized (numeric/direct-char form is fixed).
- Run:
node -e "const {sanitizeContentCore} = require('./sanitize_content_core.cjs'); console.log(sanitizeContentCore('@­victim say hi'));"
→ Output is @­victim say hi unchanged — bypassed.
- Repeat step 3 substituting
‍, ‌, ‎, ‏ — all bypass.
- Confirm root cause:
decodeHtmlEntities source lists no entries for ­, ‍, ‌, ‎, ‏, &wj;, or ​.
Observed Behavior
sanitizeContentCore('@­victim say hi') returns the input unchanged. The named entity ­ is not decoded by decodeHtmlEntities, survives Step 3 stripping, and defeats neutralizeAllMentions because & is not in [A-Za-z0-9].
Expected Behavior
sanitizeContentCore('@­victim') should return `@victim` — the named entity should be decoded to its Unicode code point (U+00AD) before Step 3 strips it, and the resulting bare @victim should be neutralized like any other mention.
Security Relevance
The @mention neutralization guarantee documented at the safe-outputs reference page is violated for named HTML entity forms of invisible characters. An adversarial issue or PR body that causes the AI to emit @­maintainer (achievable via prompt injection) will pass through the sanitizer and, after GitHub renders it, may trigger a real notification to @maintainer. The bypass is achievable with any safe-output type that routes content through sanitizeContent.
Suggested Fix
Extend decodeHtmlEntities (after the & block) to map named invisible-char entities to their Unicode code points before Step 3 runs:
result = result.replace(/­/gi, "\u00AD"); // soft hyphen
result = result.replace(/‍/gi, "\u200D"); // zero-width joiner
result = result.replace(/‌/gi, "\u200C"); // zero-width non-joiner
result = result.replace(/‎/gi, "\u200E"); // left-to-right mark
result = result.replace(/‏/gi, "\u200F"); // right-to-left mark
result = result.replace(/&wj;/gi, "\u2060"); // word joiner
result = result.replace(/​/gi, "\u200B"); // zero-width space
Also add regression tests asserting that sanitizeContentCore('@­victim') and sanitizeContentCore('@‎victim') produce neutralized output. A broader audit of HTML5 named character references for additional invisible/confusable characters is also warranted.
Additional Context
If the current named entity behavior is intentional (e.g., the sanitizer is not expected to handle HTML entity-encoded content), that assumption should be explicitly documented alongside the @mention neutralization guarantee in the safe-outputs reference, along with any upstream requirements that guarantee content arrives pre-decoded.
Original finding: https://github.com/githubnext/gh-aw-security/issues/2086
gh-aw version: v0.68.3
Generated by File Issue · ● 368.8K · ◷
Summary
decodeHtmlEntitiesinactions/setup/js/sanitize_content_core.cjs(v0.68.3) handles decimal () and hex () numeric entities for invisible/formatting characters, but does not handle their named entity forms (­,‍,‌,‎,‏). BecausehardenUnicodeTextStep 3 operates on actual Unicode code points — not on&name;string literals — the named entity forms survive intact.neutralizeAllMentionsthen fails to match@­victimbecause the character after@is&(not[A-Za-z0-9]), so the mention passes through unsanitized. When GitHub renders the output, the entity decodes to an invisible character and the result appears as@victimto readers. This is a partial bypass of the fix applied in gh-aw#24154 (originating from the #1611 finding).Affected Area
Safe-outputs output sanitization boundary —
sanitizeContent/sanitizeContentCore→decodeHtmlEntities(line ~975) →hardenUnicodeTextStep 3 stripping regex. Affects all safe-output types where body/text fields have"sanitize": trueinvalidation.json.Reproduction Outline
sanitize_content_core.cjsat v0.68.3 (SHA159c2fed045bdd850374b084fe92182c9e31b147237944f41aecd765d068e685).node -e "const {sanitizeContentCore} = require('./sanitize_content_core.cjs'); console.log(sanitizeContentCore('@\u00ADvictim say hi'));"→ Output includes
`@victim`— neutralized (numeric/direct-char form is fixed).node -e "const {sanitizeContentCore} = require('./sanitize_content_core.cjs'); console.log(sanitizeContentCore('@­victim say hi'));"→ Output is
@­victim say hiunchanged — bypassed.‍,‌,‎,‏— all bypass.decodeHtmlEntitiessource lists no entries for­,‍,‌,‎,‏,&wj;, or​.Observed Behavior
sanitizeContentCore('@­victim say hi')returns the input unchanged. The named entity­is not decoded bydecodeHtmlEntities, survives Step 3 stripping, and defeatsneutralizeAllMentionsbecause&is not in[A-Za-z0-9].Expected Behavior
sanitizeContentCore('@­victim')should return`@victim`— the named entity should be decoded to its Unicode code point (U+00AD) before Step 3 strips it, and the resulting bare@victimshould be neutralized like any other mention.Security Relevance
The
@mentionneutralization guarantee documented at the safe-outputs reference page is violated for named HTML entity forms of invisible characters. An adversarial issue or PR body that causes the AI to emit@­maintainer(achievable via prompt injection) will pass through the sanitizer and, after GitHub renders it, may trigger a real notification to@maintainer. The bypass is achievable with any safe-output type that routes content throughsanitizeContent.Suggested Fix
Extend
decodeHtmlEntities(after the&block) to map named invisible-char entities to their Unicode code points before Step 3 runs:Also add regression tests asserting that
sanitizeContentCore('@­victim')andsanitizeContentCore('@‎victim')produce neutralized output. A broader audit of HTML5 named character references for additional invisible/confusable characters is also warranted.Additional Context
If the current named entity behavior is intentional (e.g., the sanitizer is not expected to handle HTML entity-encoded content), that assumption should be explicitly documented alongside the
@mentionneutralization guarantee in the safe-outputs reference, along with any upstream requirements that guarantee content arrives pre-decoded.Original finding: https://github.com/githubnext/gh-aw-security/issues/2086
gh-aw version: v0.68.3