-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Email parser creates a message object that can't be flattened #76511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is related to https://bugs.python.org/issue27321 but a different exception is thrown for a different reason. This is caused by a defective spam message. I don't actually have the offending message from the wild, but the attached bad_email_2.eml illustrates the problem. The defect is the message declares the content charset as us-ascii, but the body contains non-ascii. When the message is parsed into an email.message.Message object and the objects as_string() method is called, UnicodeEncodeError is thrown as follows: >>> import email
>>> with open('bad_email_2.eml', 'rb') as fp:
... msg = email.message_from_binary_file(fp)
...
>>> msg.as_string()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/email/message.py", line 159, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
self._write(msg)
File "/usr/lib/python3.5/email/generator.py", line 181, in _write
self._dispatch(msg)
File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch
meth(msg)
File "/usr/lib/python3.5/email/generator.py", line 243, in _handle_text
msg.set_payload(payload, charset)
File "/usr/lib/python3.5/email/message.py", line 316, in set_payload
payload = payload.encode(charset.output_charset)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-33: ordinal not in range(128) |
What would you like to see happen in that situation? Should we use errors=replace like we do for headers? (That seems reasonable to me.) Note that it can be re-serialized as binary. |
Yes. I think errors=replace is a good solution. In Mailman, we have our own mailman.email.message.Message class which is a subclass of email.message.Message and what we do to work around this and bpo-27321 is override as_string() with: def as_string(self):
# Work around for https://bugs.python.org/issue27321 and
# https://bugs.python.org/issue32330.
try:
value = email.message.Message.as_string(self)
except (KeyError, UnicodeEncodeError):
value = email.message.Message.as_bytes(self).decode(
'ascii', 'replace')
return value |
I do wonder where you are using the string version of messages :) I actually thought I'd already done this (errors=replace), but obviously not. I don't have time now to work on a patch for this, and the patch in the other issue hasn't be updated to reflect the review I did :( |
Probably some places where we could use bytes, but one of the problem areas is where we save the content of a message held for moderation. |
If use the replace error handler here, |
…ith ASCII charset
I think that #116125 is more correct solution. |
…ith ASCII charset (pythonGH-116125) (cherry picked from commit f97f25e) Co-authored-by: Serhiy Storchaka <[email protected]>
…ith ASCII charset (pythonGH-116125) (cherry picked from commit f97f25e) Co-authored-by: Serhiy Storchaka <[email protected]>
…with ASCII charset (GH-116125) (GH-116364) (cherry picked from commit f97f25e) Co-authored-by: Serhiy Storchaka <[email protected]>
…with ASCII charset (GH-116125) (GH-116365) (cherry picked from commit f97f25e) Co-authored-by: Serhiy Storchaka <[email protected]>
…ith ASCII charset (pythonGH-116125)
…ith ASCII charset (pythonGH-116125)
- bump python to 3.11.8 to resolve security vulnerabilities - bump gosu to 1.17 to resolve security vulnerabilities - bump Django to 5.0.4 - bump self-hosted docker image to _bookworm_ (Debian 12) We [previously tried bumping to Python 3.11.9](#69468), but ran into an odd unicode decoding error in getsentry/getsentry#13760 within our tests. See python/cpython#76511. Python 3.11.8 works. --------- Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: