Skip to content

traceback.extract_stack raises exception if source filename is pyc and exists on disk #92336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
huwcbjones opened this issue May 5, 2022 · 11 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@huwcbjones
Copy link

Bug report
If you compile a python file with the cfile as a pyc and it exists on disk, then if you extract the stack when this module is in the stack, linecache will try to load the file as utf-8 and explode.

Simple proof of concept below:

import py_compile
import sys
import traceback
from tempfile import NamedTemporaryFile

with NamedTemporaryFile(suffix=".py") as tmp_f:
    tmp_f.write(b"""\
def call(func):
    return func()
""")
    tmp_f.flush()
    py_compile.compile(tmp_f.name, dfile="/tmp/crash.pyc", cfile="/tmp/crash.pyc")

sys.path.insert(0, "/tmp")
import crash
crash.call(traceback.print_stack)

results in a traceback of

Traceback (most recent call last):
  File "/home/huw/poc.py", line 16, in 
    crash.call(traceback.print_stack)
  File "/tmp/crash.pyc", line 2, in call
  File "/usr/lib/python3.9/traceback.py", line 190, in print_stack
    print_list(extract_stack(f, limit=limit), file=file)
  File "/usr/lib/python3.9/traceback.py", line 211, in extract_stack
    stack = StackSummary.extract(walk_stack(f), limit=limit)
  File "/usr/lib/python3.9/traceback.py", line 366, in extract
    f.line
  File "/usr/lib/python3.9/traceback.py", line 288, in line
    self._line = linecache.getline(self.filename, self.lineno).strip()
  File "/usr/lib/python3.9/linecache.py", line 30, in getline
    lines = getlines(filename, module_globals)
  File "/usr/lib/python3.9/linecache.py", line 46, in getlines
    return updatecache(filename, module_globals)
  File "/usr/lib/python3.9/linecache.py", line 137, in updatecache
    lines = fp.readlines()
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 9: invalid continuation byte
$ hd /tmp/crash.pyc
00000000  61 0d 0d 0a 00 00 00 00  3e d3 73 62 22 00 00 00  |a.......>.sb"...|
                                      ^ position 9

Your environment
macOS 11.6.1 (x64): 3.7.4, 3.8.1, 3.9.1, 3.10.0
Debian Buster (x64): 3.7.3, 3.9.2
Debian Bullseye (x64): 3.9.2

@huwcbjones huwcbjones added the type-bug An unexpected behavior, bug, or error label May 5, 2022
@iritkatriel iritkatriel changed the title traceback.extract_stack explodes if source filename is pyc and exists on disk traceback.extract_stack raises exception if source filename is pyc and exists on disk May 5, 2022
@gpshead
Copy link
Member

gpshead commented May 8, 2022

From the py_compile.compile docs: "If dfile is specified, it is used as the name of the source file in error messages instead of file."

dfile= in your example code is pointing to a .pyc file when it is supposed to point to a .py source file if specified. I'm not sure this is a bug, the generated crash.pyc was constructed to point to a non-.py-source file as its source.

@huwcbjones
Copy link
Author

it is supposed to point to a .py source file

The pycompile docs do not say this. From the sentence you quoted (and the rest of the docs) there are no requirements on what the format of dfile must be. It just says "source file".

The docs also clearly state "used for error messages". In no way shape or form would I then expect the error handling path to open the file, I expected the dfile value to be shoved straight into the traceback.

From my PoV, there are two ways to address this:

  1. fix linecache to cope with bad input the same way it handles FileNotFound.
  2. update the docs to explain the caveats of setting dfile to something which may exist on disk and contain non-utf8 decodable content.

@iritkatriel
Copy link
Member

there are no requirements on what the format of dfile must be. It just says "source file".

A pyc is not a source file, it's a compiled bytecode file.

The docs also clearly state "used for error messages". In no way shape or form would I then expect the error handling path to open the file, I expected the dfile value to be shoved straight into the traceback.

The source file is used to put source code lines in the traceback. Obviously a pyc won't help with that.

fix linecache to cope with bad input the same way it handles FileNotFound.

It allows the UnicodeDecodeError to propagate to the caller. Seem pretty clear to me.

I don't think there is anything to fix here.

@iritkatriel iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2022
@huwcbjones
Copy link
Author

I don't think there is anything to fix here.

Throw an exception if source file has a .pyc suffix?
Update to the docs to clear this up maybe?

@iritkatriel
Copy link
Member

Throw an exception if source file has a .pyc suffix?

It did throw an exception.

Update to the docs to clear this up maybe?

The docs are clear that it should be a source file.

@huwcbjones
Copy link
Author

It did throw an exception.

Sure, but if you'd like to spend an entire day in the bowels of django's template loader figuring out why on earth traceback is exploding in a ball of flames for unbeknownst reasons, then be my guest.
But, for your sanity (and the lack thereof of mine), I would strongly recommend against doing so.
It took a fair amount of printf debugging to figure out that the original exception was raised in traceback due to the aforementioned UnicodeDecodeError which was then being caught by django under the assumption that a perfectly valid template was not utf-8 encoded before blowing the entire stack up.

All because of a single mention of "source file" in the docs that did not highlight how paramount it was not point also have this point at a pyc.

So, I think you can understand why I'm pretty insistent that something should be done rather than "Meh, don't do that. Docs are perfect and the exception message is really helpful".

@iritkatriel
Copy link
Member

I understand that you are upset because you had a hard time debugging this. I'd suggest you think about how unit testing would have made it easier to find this problem (it's not an intermittent issue - a single test covering this function call would have caught it).

Are you saying that you read the docs, misunderstood them, and then wrote your code with this bug in it based on that misunderstanding?

@huwcbjones
Copy link
Author

I'm not upset, just frustrated because I know how easy it was to cause something that took a long time to diagnose. I don't want someone else to do the same and I believe that there is something that can be done to prevent it from happening again.

Unittests would not have helped me, because all 12k of them passed. The compilation runs as part of our build system, so it was only when the database migrations were being run in the appliance chroot that django then exploded whilst loading the template engine. In the chroot/appliance we only have the binary files, however their location on disk is the same as the dfile when they were compiled (important for later). I couldn't reproduce it in my dev environment because of the difference in file system layout.

My major gripe is that in the docs the only information about dfile is

If dfile is specified, it is used as the name of the source file in error messages instead of file.

name of the source file was the key bit of information I gleaned from reading that sentence a few times.
But had I known that traceback generation would read the source file (dfile parameter) from disk (if it existed) to get the error location, I wouldn't have opened the issue. The error location information is still present/correct when the source file doesn't exist. However, when the "sourcefile" exists but isn't valid, instead of falling back to the "not found" path, it allows a UnicodeDecodeError to make it's way back up the stack?

@iritkatriel
Copy link
Member

Fair enough, we can state that the source file is read.

@iritkatriel iritkatriel reopened this Jun 28, 2022
@iritkatriel
Copy link
Member

Actually you are right about linecache as well: the doc for getline says "This function will never raise an exception — it will return '' on errors".

So it should swallow the UnicodeDecodeError from fp.readlines().

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 30, 2022
…ding errors (pythonGH-94410)

(cherry picked from commit 21cbdae)

Co-authored-by: Irit Katriel <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 30, 2022
…ding errors (pythonGH-94410)

(cherry picked from commit 21cbdae)

Co-authored-by: Irit Katriel <[email protected]>
miss-islington added a commit that referenced this issue Jun 30, 2022
…rrors (GH-94410)

(cherry picked from commit 21cbdae)

Co-authored-by: Irit Katriel <[email protected]>
miss-islington added a commit that referenced this issue Jun 30, 2022
…rrors (GH-94410)

(cherry picked from commit 21cbdae)

Co-authored-by: Irit Katriel <[email protected]>
@iritkatriel
Copy link
Member

Thank you @huwcbjones . The documentation change and the linecache bugfix have been pushed to all versions from 3.10.

gvanrossum pushed a commit to gvanrossum/cpython that referenced this issue Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants