Skip to content

Conversation

@Yatharths11
Copy link

@Yatharths11 Yatharths11 commented Oct 23, 2025

Modified hdyc regex to accept spaces and percent-encoded spaces
For Issue #6419

:github => %r{\Ahttps?://(?:www\.)?github\.com/([a-zA-Z0-9_-]+)},
:gitlab => %r{\Ahttps?://(?:www\.)?gitlab\.com/([a-zA-Z0-9_-]+)},
:hdyc => %r{\Ahttps?://(?:www\.)?hdyc\.neis-one\.org/\?([a-zA-Z0-9_-]+)},
:hdyc => %r{\Ahttps?://(?:www\.)?hdyc\.neis-one\.org/\?([a-zA-Z0-9\s%20_-]+)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to add 20 there as that's a character class so it's matching each character individually and digits are already allowed.

I'm not really keen to handling escaping like this anyway - it would probably be better to canonicalise the URLs before matching them so we don't have to handle both escaped and unescaped versions in the regular expressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do we want \s or just allowed? Are other forms of whitespace like tab really valid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed for #6419 the view uses the text truncate class, can that be the reason for the issue ?
If it is, there will be no reason to modify the regex here and just fix the view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question on what exact characters are allowed. HDYC references our display names, so anything valid on our website will be valid for HDYC. This seems to be the relevant declaration in our user validations:

validates :display_name, :if => proc { |u| u.display_name_changed? },
:characters => { :url_safe => true },
:whitespace => { :leading => false, :trailing => false },
:width => { :minimum => 3 }

Which in turn uses a custom validator that excludes these characters:

INVALID_CHARS = "\x00-\x08\x0b-\x0c\x0e-\x1f\x7f\ufffe\uffff"
INVALID_URL_CHARS = "/;.,?%#"

So perhaps those constants should be used in the regexp? Similar to what the validator does when it goes /[#{INVALID_CHARS}]/o.match?(value).

Copy link
Contributor

@pablobm pablobm Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, generally on regular expressions: "character classes" (lists of characters surrounded by square brackets [...]) take each character individually, so %20 is %, 2, or 0 individually, but not the three together. The \s does work because \ is a escape character which is a special case. % is a escape character in URLs, but not in regular expressions.

This is a great website to test out regular expressions: https://rubular.com/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the changes made are valid right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well they don't actually break anything but they also don't really match % escape sequence and it definitely has redundant duplicate characters in the character class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the tests are failing, as %20 is not translated to as they expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the changes made are valid right?

It would allow stuff like %22, %_20, and other invalid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants