-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Newline embedded in email RFC2047 encoding raises exception when parsed #114906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A crash is when the process stops without finishing or raising an exception. Stopping on uncaught exceptions like this is normal behavior. Whether this exception is a bug or a feature change request depends on what, if anything, the doc specifies. If you want to claim 'bug' please find and report a violated specification. Without reading the email doc, I can imagine that users are expected to catch such input errors and handle them however they want -- ignore the email, send a rejection to the sender if possible, or display with a modified header -- and that there are users depending on catching this exception. |
Sorry if my wording is imprecise. I'll be happy to reclassify this as a feature request. The problem, as I see it, is that the current architecture allows for no user-level control over this behavior; there is no way to proceed with a message with this type of defect. My perception was that the mechanism to register defects was implemented precisely so that users of the library can decide for themselves how exactly to handle any specific email message defects. Subjectively, I see a lot of scenarios where users (often with limited knowledge of email) want to use Python to process some messages, and get a traceback which they have no way to avoid. A simple and common use case is to extract attachments in bulk; you don't really care if the address of the sender was valid, you just want to loop over all messages, and for each find whether it contains an attachment, and if so extract it. |
The I'd appreciate additional guidance on how to proceed with this. I can see three possibilities;
For all of these, an obvious complication is that they introduce a new convention, and then the rest of the code should be adapted to obey the same convention. To quickly estimate the size of the problem,
|
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
I came across some messages where the sender had embedded a newline in the
From:
header's display string. This was (ostensibly) a legitimate sender, possibly hoping to get more real estate for their message in the recipient's inbox listing or something, or just making a configuration mistake.Unfortunately, this crashes Python's
email
parser withpolicy=default
. (The legacy parser works fine, simply because it doesn't attempt to unpeel RFC2047 encoding by default.)While the error message is correct, it should probably not raise an exception; perhaps instead register a defect?
I have tested this on 3.11 out of the box and with the current sources from the cpython Github repo; but I would expect it to manifest on all versions of the modern
email
module and all platforms.CPython versions tested on:
3.11
Operating systems tested on:
macOS
The text was updated successfully, but these errors were encountered: