Skip to content

gh-105802: email._header_value_parser: prevent IndexError in get_obs_local_part #108133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

fsc-eriker
Copy link
Contributor

@fsc-eriker fsc-eriker commented Aug 19, 2023

The parsing logic is supposed to raise an error if the header cannot be parsed, but I found a corner case where this didn't happen, and a different error ended up being raised, which caused Python to abort for some malformed messages.

Pardon any problems with this PR; this is my first one.

I have completed the CLA in 2021 (BPO user name eriker)

@fsc-eriker fsc-eriker requested a review from a team as a code owner August 19, 2023 11:23
@ghost
Copy link

ghost commented Aug 19, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@mbaldessari
Copy link

FWIW this fixed my local breakage with offlineimap and certain mails from microsoft:

ERROR: ERROR in syncfolder for acksyn folder shopping: Traceback (most recent call last):
  File "/usr/lib64/python3.12/email/_header_value_parser.py", line 2069, in get_msg_id
    token, value = get_dot_atom_text(value)
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/email/_header_value_parser.py", line 1334, in get_dot_atom_text
    raise errors.HeaderParseError("expected atom at a start of "
email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found '[32f63a3f11e84bc48eb79fd0302982f3-JFBVALKQOJXWILKCJQZFA7CJIFGVGU2QKJ6FGU2QKJCW2YLJNR6EK6DPKNWXI4A=@microsoft.com]>'

and it has not broken anything else in my current usage

@serhiy-storchaka serhiy-storchaka changed the title gh-105802 email._header_value_parser: prevent IndexError in get_obs_local_part gh-105802: email._header_value_parser: prevent IndexError in get_obs_local_part Jan 31, 2024
Comment on lines +2596 to +2597
'<', # sic
'<', # sic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these "sic" mean?

Comment on lines 1521 to 1523
or (obs_local_part[0].token_type=='cfws'
and len(obs_local_part) == 1
or obs_local_part[1].token_type=='dot')):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for all these new cases. For example this condition is passed if there are more that 1 parts and the second part is a dot. Does it report a defect "Invalid leading '.' in local part" for "python.org"?

@erlend-aasland
Copy link
Contributor

@fsc-eriker, please sign the CLA; without that, we cannot merge the PR.

…ingle cwf

Don't add new error cases when refactoring; simply ensure that the code avoids an IndexError for the cases it attempts to handle
@zware
Copy link
Member

zware commented Feb 14, 2024

@fsc-eriker, please sign the CLA; without that, we cannot merge the PR.

The CLA was already signed (see edit history of @cpython-cla-bot's post), the issue with it now came from the history getting twisted up somehow. I'll see if I can straighten it out, but the PR may be tainted to the point of needing to be remade.

@zware zware force-pushed the fix-issue-gh-105802 branch from bac1f84 to 03efb0d Compare February 14, 2024 20:08
@zware
Copy link
Member

zware commented Feb 14, 2024

@fsc-eriker, I force-pushed to your branch to clean up the history, removing what looks like a rebase misfire. Please do git remote update && git checkout fix-issue-gh-105802 && git reset --hard 03efb0d before making any other changes to this PR. Note that the reset --hard will remove any changes that haven't already been pushed here. If you try another rebase (I'm assuming that's what got things off in the first place :)), make sure do it interactively (git rebase --interactive upstream/main) and only pick those commits that you yourself authored: picking any other commits will rewrite commits that already exist in the upstream main branch.

If this is still pinging everyone that was brought in by the old history, it may be better to go ahead and recreate this PR anyway :)

@fsc-eriker
Copy link
Contributor Author

fsc-eriker commented Feb 15, 2024

@zware Sorry for botching the rebase and thanks for fixing it.

The reason I attempted a rebase is that Github remarks that my branch is out of date with respect to main; should I simply ignore that and let whoever eventually merges this PR sort that out?

@hugovk
Copy link
Member

hugovk commented Feb 15, 2024

The reason I attempted a rebase is that Github remarks that my branch is out of date with respect to main; should I simply ignore that and let whoever eventually merges this PR sort that out?

Yeah, you can ignore that (unless there's something important you need from main, or there's a conflict).

All PRs are squashed to a single commit on merge, and rebases can make it harder to review new changes.

@terryjreedy
Copy link
Member

If you create a branch from up-to-date local main, then as soon as a merge to cpython/main happens, which is multiple times a day, it is out of date. If you update your branch with `git merge upstream/main' (the usual right way) and immediately push to your fork and create a PR, it will soon be out-of-date. This is routine. Only worry if it says there is a merge conflict. Almost never use rebase as it is easy to create problems and almost never needed.

A coredev looking to merge who notices that a PR is way out-of-date but still able to be merged may update the branch to rerun tests to make sure they still pass. This depends on what the patch does and how likely other changes could be a problem.

@serhiy-storchaka
Copy link
Member

Thank you for your contribution, @fsc-eriker, but I think that it is not a correct solution. Invalid Message-ID should not be truncated, even with adding defects. The parsing code should raise HeaderParseError, this will allow to preserve its literal value. See #117934.

@serhiy-storchaka
Copy link
Member

Since the initial version of your PR also raised an exception, I added you as the co-author of #117934.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants