Skip to content

os.path.ismounts: further situations it doesn't detect correctly #96328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
calestyo opened this issue Aug 27, 2022 · 14 comments
Open

os.path.ismounts: further situations it doesn't detect correctly #96328

calestyo opened this issue Aug 27, 2022 · 14 comments
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@calestyo
Copy link
Contributor

Bug report

I had a look at os.path.ismount():

cpython/Lib/posixpath.py

Lines 186 to 216 in 43a6dea

def ismount(path):
"""Test whether a path is a mount point"""
try:
s1 = os.lstat(path)
except (OSError, ValueError):
# It doesn't exist -- so not a mount point. :-)
return False
else:
# A symlink can never be a mount point
if stat.S_ISLNK(s1.st_mode):
return False
if isinstance(path, bytes):
parent = join(path, b'..')
else:
parent = join(path, '..')
parent = realpath(parent)
try:
s2 = os.lstat(parent)
except (OSError, ValueError):
return False
dev1 = s1.st_dev
dev2 = s2.st_dev
if dev1 != dev2:
return True # path/.. on a different device as path
ino1 = s1.st_ino
ino2 = s2.st_ino
if ino1 == ino2:
return True # path/.. is the same i-node as path
return False

And it may have some issues:

  1. The documentation already says, that it isn't capable of detecting bind-mounts of some filesystem on the very same filesystem. While it's good that it mentions this, the results of ismount() is IMO still simply wrong in these cases, cause it is a mountpoint.
  2. The same problem in the detection algorithm is not only limited to bind-mounts, but also to any other mount points, where the path’s st_dev equals that of its parent. For example, btrfs subvolumes. This is however not documented.

Looking a the code, it effectively does: lstat(), check whether it’s a symlink, lstat() the parent, check whether the device numbers are different (then it's a mountpoint), check whether the inodes are the same.

Now I thought about the latter, and what it is meant for.

The only case I could think of at first is /, where /.. is the same inode. [Is there any other?]

  1. This however already requires, that not only the inodes are the same, but also the device numbers. which is given because of the order of these checks, however, I'd suggest to comment that accordingly.
  2. If the above were the only case, wouldn't it be faster to simply check the pathname for being "/" or b"/". Would perhaps require some realpath() though, to also catch things like "/../.." and so on.
  3. However, I found that / is not the only case: Within a chroot, the chroot's / may (or may not) be a mountpoint - the problem is just, the current algorithm doesn't detect this either. Instead, it wrongly claims that a chroot’s "/" would be a mountpoint, even when it isn't.

Not sure what one could do about all that… /etc/mtab is dead... /proc/mounts may not be available on all POSIX systems and even on Linux it's nowadays rather superseded by /proc/self/mountinfo.

Your environment

  • CPython versions tested on: 3.10.6
  • Operating system and architecture: Debian sid

Cheers,
Chris.

@calestyo calestyo added the type-bug An unexpected behavior, bug, or error label Aug 27, 2022
@calestyo
Copy link
Contributor Author

A further bug in ismount() seems to be e.g.:

ismount("/root/foo")

assuming that /root/ isn't accessible by the calling user.

The function reports False, however, it may very well be a mountpoint, just the user cannot see.

@eryksun
Copy link
Contributor

eryksun commented Aug 28, 2022

The function reports False, however, it may very well be a mountpoint, just the user cannot see.

If direct access is denied, then exists(), lexists(), islink(), isdir(), and isfile() also return false. They depend on either stat() or lstat().

The point about btfrs subvolumes is a duplicate of #81520. Returning true for btrfs subvolumes might be useful. The fact that rename() to a another subvolume fails with EXDEV (i.e. invalid cross-device link) makes it a peculiar case if it's not a mount point. Either way, it would help to call attention to this in the docs.

According to POSIX, a mount point is "[e]ither the system root directory or a directory for which the st_dev field of structure stat differs from that of its parent directory". This is all that posixpath.ismount() implements. In Linux, a bind mount point may not satisfy the requirement for the parent to be on a different device (unless bindfs is used). That said, I tested a rename() across a bind mount point, and it does fail with EXDEV even when the source and destination are on the same device.

Regarding chroot(), returning true for the root directory in the current process possibly fits the POSIX definition of a mount point, depending on the interpretation of "system" in "system root directory". It could be the system as seen by a process. OTOH, POSIX doesn't even define chroot(), so a literal interpretation of "system root directory" in the spec would be the real root directory. How would one detect the chroot() case in a cross-platform way? One can't rely on a particular inode number for the root directory in a filesystem. When does this case matter?

@eryksun eryksun added the docs Documentation in the Doc dir label Aug 28, 2022
@calestyo
Copy link
Contributor Author

If direct access is denied, then exists(), lexists(), islink(), isdir(), and isfile() also return false. They depend on either stat() or lstat().

Aren't these results then also bogus?! If one does e.g. exists() (where the behaviour is however documented, at least, however for the others, e.g. isdir() it's not really) one wants to know whether it exists or not - not that it doesn’t when one actually just hadn't the permissions to check.

Same für btrfs and #81520 ... having it documented is better than nothing, but it would still not match the semantics of what one could reasonably expect from functions named isdir(), exists() or ismount()

I'm not so sure whether strictly following the POSIX definition is appropriate here. That is probably decades old and back then there weren't concepts like subvolumes or bind mounts, or at least POSIX didn't really care too much about them.

It does at least not match what an OS may nowadays consider a mountpoint.

Imagine you write a tool, that e.g. deletes a file tree and wants to check before whether anything is a mounpoiunt (cause if so, it couldn't delete that)… this would already fail with the current ismount().
Or imagine one wants to implement a true -xdev option and needs to check whether a dir is a mounpoint (though admittedly, find, as of now, also fails on this).

I don't think there's a real portable way (which is because POSIX doesn't really standardise these things too deeply)... the inode number doesn't work, not even under Linux (again btrfs, e.g.).

Use case… well I don't know… perhaps something like: if some dir is a mountpoint, one want's to set some fs properties or maybe remount,ro it… so it should work when / is truly the mountpoint but not e.g. in a chroot, unless the chroot is in its own filesystem.

Maybe one could add non-portable ways of determining (e.g. using /proc/self/mountinfo)… problem is though, whether this can be made fast… and fall back to the current algo if there is none for the current platform.
One could then also add a strict parameter that lets the function fail in cases where it cannot be said for sure.

@calestyo
Copy link
Contributor Author

calestyo commented Aug 28, 2022

Some more things, which are however not possible bugs:

First:

cpython/Lib/posixpath.py

Lines 194 to 196 in e860e52

# A symlink can never be a mount point
if stat.S_ISLNK(s1.st_mode):
return False

Edit: this part is bogus see the 2nd comment after this one AFAIU, this is in principle not necessary, and even without it, the code should never consider a symlink to be a mountpoint. `lstat()` is used everywhere, so the symlink’s device ID should always be different from it's parent directory's.

If that piece of code is however just for speed up (i.e. to save the realpath() and the stat() on the parent directory - which would be a good thing), then why not simply checking for S_ISDIR? No file type other than directory should ever be able to be a directory.
(I could provide a patch if desired).

@calestyo
Copy link
Contributor Author

calestyo commented Aug 28, 2022

Second:

parent = realpath(parent)

I already tried to mail the two authors of the commit 750018b (issue #46718) which introduced that.

The call to realpath() makes the execution of ismount() considerably slower. When I iterate over a big test dir tree of mine, with some Python code that has an -xdev option where I then need to check every dir whether it's a mountpoint (and if so, not traverse it), I get for example the following results:

  • The tree has 1039511 files of which 67881 are directories.
  • With the realpath() one gets:
$ strace --follow-forks --summary-only ./traverse /usr /home >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 82,55    5,850196           3   1731502        45 newfstatat
  9,10    0,644943           4    136251           getdents64

without:

$ strace --follow-forks --summary-only ./traverse /usr /home >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 76,37    4,131570           3   1175372        45 newfstatat
 12,35    0,668089           4    136251           getdents64

That's an extra of 556130 newfstatat()s because of the realpath(). I assume it calls a stat for each path and every path component, and if I count the path components via:

$ find /usr /home -type d | tr -c -d '/' | wc -c

the numbers match (off a few, because stuff gets written/deleted in /home).

Even after a lot of thinking, I don't quite understand why it would be needed.

The bug #46718 mentions describes it as fixing it for the case when there's some directory where the user doesn't have permissions, but wouldn't a better solution for that be to simply not catch the exception in:

cpython/Lib/posixpath.py

Lines 205 to 206 in e860e52

except (OSError, ValueError):
return False

?

realpath() resolves symlinks, so I tried to construct a directory structure, where some symlinks would trick ismount() without the realpath() into a false positive or false negative, but failed to do so:

Consider e.g.

test/        a directory (dev 28)
test/mnt/    an actual mountpoint (dev 78)
test/mnt/dd  symlink to ..
test/m2/     a directory (not a mountpoint)

I copy&pasted ismount() with the call to realpath() removed, and from
within test/ ran:

>>> ismount("mnt/dd/mnt") gives True
>>> ismount("mnt/dd/m2")  gives False

just as it should.

Could it be, that something changed and the realpath() is no longer
needed?

Looking at the lstat() for the parent path, gives the following:

>>> os.stat("mnt/dd/mnt/..", follow_symlinks=False)
os.stat_result(st_mode=16877, st_ino=11603350, st_dev=28, st_nlink=1, st_uid=1000, st_gid=1000, st_size=28, st_atime=1661051997, st_mtime=1661646137, st_ctime=1661646137)
>>> os.stat("mnt/dd/m2/..", follow_symlinks=False)
os.stat_result(st_mode=16877, st_ino=11603350, st_dev=28, st_nlink=1, st_uid=1000, st_gid=1000, st_size=28, st_atime=1661051997, st_mtime=1661646137, st_ctime=1661646137)

=> both on dev 28 (that's the device of test/)
so it seems as if os.stat() would already resolve the symlinks

Whereas:

>>> os.stat("mnt/dd", follow_symlinks=False)
os.stat_result(st_mode=41471, st_ino=3, st_dev=78, st_nlink=1, st_uid=1000, st_gid=1000, st_size=2, st_atime=1661646604, st_mtime=1661646604, st_ctime=1661646604)

=> the symlink to .. within the mountpoint, dev 78

>>> os.stat("mnt/dd/mnt/", follow_symlinks=False)
os.stat_result(st_mode=17407, st_ino=1, st_dev=78, st_nlink=2, st_uid=0, st_gid=0, st_size=60, st_atime=1661646604, st_mtime=1661646604, st_ctime=1661646604)
>>> os.stat("mnt/", follow_symlinks=False)
os.stat_result(st_mode=17407, st_ino=1, st_dev=78, st_nlink=2, st_uid=0, st_gid=0, st_size=60, st_atime=1661646604, st_mtime=1661646604, st_ctime=1661646604)

=> both, the mountpoint, dev 78

>>> os.stat("mnt/dd/m2/", follow_symlinks=False)
os.stat_result(st_mode=16877, st_ino=11908701, st_dev=28, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=1661646137, st_mtime=1661646137, st_ctime=1661646137)

=> again, the plain dir outside the mountpoint, dev 28

So Do I miss anything here, what the call of realpath() is really good for? Or could it be dropped?

@calestyo
Copy link
Contributor Author

The point above in #96328 (comment) is bogus.... well partially.

The check for not-a-symlink is needed, though I still think one could perhaps improve it to check for the pathname being a true directory (as no other file types can be mountpoints, I'd guess).

An example where the function wouldn't work without the not-a-symlink check is:

/                (dev 28)
test/            a directory (dev 28)
test/mnt/        an actual mountpoint (dev 78)
test/mnt/foo/    directory (dev 78)
test/mnt/foo/rl  symlink (dev 78) to /

then an ismount("mnt/foo/rl") without the not-a-symlink check would give True, because the lstat() on mnt/foo/rl would give the symlink dev 78, whereas on mnt/foo/rl/.. it would give /.

So together with #96328 (comment) I'd say that it seems stat() resolves all path components except the last (which is resolved or not, depending on follow_symlinks.

Does that make sense?

@calestyo
Copy link
Contributor Author

I might have also found a (questionable?) reason for the realpath(). It seems it doesn't really have much to do with symbolic links and their resolution, but (as it was stated in the original bug) with permissions.

Consider e.g. a directory /foo/no-perms/, where no-perms is not accessible, then os.stat("/foo/no-perms/", follow_symlinks=False) succeeds, but os.stat("/foo/no-perms/..", follow_symlinks=False) fails with a permission error (I'd say: just as it should).

The same with the stat(1) tool.

With os.path.realpath():

>>> os.path.realpath("/foo/no-perms")
'/foo/no-perms'
>>> os.path.realpath("/foo/no-perms/..")
'/foo/'
>>> os.path.realpath("/foo/no-perms/../..")
'/'
>>> os.path.realpath("/foo/no-perms/test/..")
'/foo/no-perms'
>>> os.path.realpath("/foo/no-perms/test/bar/..")
'/foo/no-perms/test'
>>> os.path.realpath("/foo/no-perms/test/bar/../..")
'/foo/no-perms'

The seems IMO, at least the 3 latter, because Python cannot know what test and bar look like (they may be symlinks, or not exist at all).

E.g. when bar is a symlink to /mnt, then as normal user:

>>> os.path.realpath("/foo/no-perms/test/bar/..")
'/foo/no-perms/test'

which is clearly wrong, but as root:

>>> os.path.realpath("/foo/no-perms/test/bar/..")
'/'

Likewise, the realpath(1) tool (which I guess simply calls realpath(3) C function?), as normal user, gives:

$ realpath /foo/no-perms
/foo/no-perms
$ realpath /foo/no-perms/..
/foo
$ realpath /foo/no-perms/../..
/
$ realpath /foo/no-perms/test/..
realpath: /foo/no-perms/test/..: Permission denied
$ realpath /foo/no-perms/test/bar/..
realpath: /foo/no-perms/test/bar/..: Permission denied
$ realpath /foo/no-perms/test/bar/../..
realpath: /foo/no-perms/test/bar/../..: Permission denied

Which I think is correct, unlike what Python does.

To simply collapsing .. should perhaps only work right "after" the inaccessible directory, and only when that is a inaccessible directory, not an inaccessible symlink (should that even be possible on some platforms).

But now back to ismount() … not sure whether this time I understood everything, but it seems that the call to realpath(), with - as said above - is IMO quite expensive, helps only in one single case, namely no-perms/.., in which no-perms is a mountpoint candidate.

It should not help if the mountpoint candidate is somewhere below no-perms/ or either because Python-realpath()’s results seem strange in these cases, or because it still couldn't ´stat()anything **below**no-perms/`.

Now arguably, the first case is not so uncommon, but still the solution seems extremely expensive, especially because of the many stat()s it may cause (on each component of the path, AFAIU).

@calestyo
Copy link
Contributor Author

Maybe one could catch the PermissionError: [Errno 13] Permission denied:… and only then use realpath() … and that perhaps better with strict=True?

@calestyo
Copy link
Contributor Author

calestyo commented Aug 30, 2022

I further thought about how one could get rid of the expensive realpath()

Assuming we remove the realpath()... (but leave the check for non-symlink):

path parent can the parent path name cause break-out of the true parent?
…/d1/d2 …/d1/d2/.. OK, break out of d1 not possible
…/d1/s2 …/d1/s2/.. OK, break-out of d1 possible (via s2), but non-symlink-check catches it
…/s1/s2 …/s1/s2/.. OK, s1 points to some dx1, which contains s2; break-out of that dx1 possible (via s2), but non-symlink-check catches
…/s1/d2 …/s1/d2/.. OK, if s->dx1, then d2, is really in dx1/ (not in s1, which is symlink); …/s1/d2/.. would give …/dx1, which is the true parent of d2

After the no-symlink-check, construct the parent path as removing the last pathname component, not using /...

The break-out cases from above should still be handled, as they're checked before.

Now for the case of d2 not being accessible by the user (which is what the realpath() was, AFAIU, previously used for), we assume d2 respectively the target of s2 is not accessible by the user.
Still a problem? No, because we don't have non-accessible/.. anymore in the lstat(), so that wont't fail.

But do we still get the true parent?

path parent
…/d1/d2 …/d1/d2 OK, d1 is the true parent of d2
…/d1/s2 …/d1/s2 OK, anyway False, because of the non-symlink-check,... but even without: d1 is the true parent of s2
…/s1/s2 …/s1/s2 OK, anyway False, because of the non-symlink-check
…/s1/d2 …/s1/d2 if we do follow s1 (i.e. stat() not lstat() the parent), even if it first points to another symlink, then we eventually get some dx1, which should however be the true parent directory of d2

(Whether we have a trailing / after the parent pathname, shouldn't matter, I think. If we keep a trailing / than we could probably use lstat(), because the symlinks would be still dereferenced, but I think that would make the code less readable.

Now @pitrou wrote in #46718 (comment):

But it may change the function result if the argument is a symlink to something (directory or mount point) on another filesystem.

But that's anyway already ruled out because of the non-symlink-check.

And in #46718 (comment) he said:

Another problem of using os.path.dirname is that for /A/B, it will return /A, and if /A is itself a symlink to a mount point itself, /A and /A/B will have different st_dev values... which will be wrongly interpreted as meaning that /A/B is a mount point.

I understand this as follows in my terms:

We check ismount("/s1/d2") (=/A/B/), s1 points (eventually) to dx1, which is a mountpoint, which however has to truly contain d2.

So if d2 is not a mountpoint (in which case we'd of course want the different device ID), why should it have another st_dev than dx1 (it's a true child of it)? It might have another st_dev that s1 (the symlink), but when we don't lstat() the parent, but stat() it, that should be fine.


TBH, even with the `path/..` way for getting the parent **and** using `lstat()`, I don't understand @pitrou’s above case (at least not the way as he wrote).
What would IMO happen is rather:
If `/A` is used as parent pathname and points to a mountpoint, then the `lstat()` would still get the device ID of the symlink. This may or may not be the same device ID as where `/A` points to and which is the true parent. So regardless of whether `/A/B` is a mountpont or not, the result might be right or wrong.
But back to my proposal:

Are the other cases ok (i.e. do we stat the right thing?), if we do stat() (for the parent only) instead of lstat()?

path parent
…/d1/d2 …/d1/d2 OK, d1 in the parent path is anyway a directory, so no following
…/d1/s2 …/d1/s2 OK, d1 in the parent path is anyway a directory, so no following
…/s1/s2 …/s1/s2 OK, anyway already False out because of the non-symlink-check
…/s1/d2 …/s1/d2 OK, we deliberately want s1 be followed, as this must be the true parent of d2

Not sure if I missed anything or messed up my case, but AFAIU, we should be able to drop the realpath(), if:

  • we keep the non-symlink-check (or even better, use: stat.S_ISDIR
  • we generating the parent path, by striping of the last pathname component (care must be taken if it has only one component... that would need to yield . then, I guess)
  • use a (last component-)symlink following stat() rather than lstat() but for the parent only

Any thoughts?

@calestyo
Copy link
Contributor Author

@eryksun

Regarding chroot(), returning true for the root directory in the current process possibly fits the POSIX definition of a mount point, depending on the interpretation of "system" in "system root directory". It could be the system as seen by a process. OTOH, POSIX doesn't even define chroot(), so a literal interpretation of "system root directory" in the spec would be the real root directory. How would one detect the chroot() case in a cross-platform way? One can't rely on a particular inode number for the root directory in a filesystem. When does this case matter?

btw: I followed that a bit up, and while you're right with chroot() the standard seem still to differ between the system root directory and a process’s root directory.

So I'd say that the POSIX definition really means the true system root dir, and not any chroot’s root dir. And thus strictly (and practically) speaking, the check comparing the inode numbers is wrong.

@calestyo
Copy link
Contributor Author

calestyo commented Aug 31, 2022

More reason why using path/.. may at least in theory be problematic:
AFAIU, POSIX does in principle not mandate for .. to exist.

It says in 4.13 Pathname Resolution:

As a special case, in the root directory, dot-dot may refer to the root directory itself.

"may"

And in 3.144 Empty Directory:

A directory that contains, at most, directory entries for dot and dot-dot

"at most", IMO indicates that it could also lack them.
However, 4.13 Pathname Resolution says:

The special filename dot shall refer to the directory specified by its predecessor. The special filename dot-dot shall refer to the parent directory of its predecessor directory.

Not sure if on can interpret the "shall" in the sense of "it must be there" or just "if it's there, then it shall do.."

Of course, in practise this probably never happens unless there's some filesystem bug (which I've however actually already seen, i.e. directories where .. was "accidentally" missing).

@calestyo
Copy link
Contributor Author

calestyo commented Aug 31, 2022

Also, os.path.realpath() does more pretty strange thing, e.g. a path …/dir/regular-file/.. where dir and regular-file do exist as such, results in …/dir, even with strict=True, but that seems utterly wrong, as regular-file has no .. directory entry.

For …/dir/non-existent/.. this only works with strict=False.

Thought, AFAICS, this strange property of os.path.realpath() is not need for os.ismount("…/dir/regular-file") to give the correct False… I think that would also without work by already catching the exception.

@calestyo
Copy link
Contributor Author

@eryksun Any expert ideas on my follow-up comments?

@mxmlnkn
Copy link

mxmlnkn commented Mar 29, 2024

Another situation that ismount does not detect correctly is when the FUSE-providing process gets uncleanly killed or crashes.

python3 -m pip install ratarmount
mkdir folder
ratarmount folder mounted
kill -9 $( pgrep ratarmount | tail -1 )
ls -lad mounted  # ls: cannot access 'mounted': Transport endpoint is not connected
mount | tail -1  # FuseMount on /tmp/mounted type fuse (rw,nosuid,nodev,relatime)
python3 -c 'import os; print(os.path.ismount("mounted"))'  # False
mountpoint -q mounted; echo $?  # 1

It seems like the mountpoint command has the same problem and without -q it prints the same error as ls. The mount point is still visible in mount, so it should be possible to detect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants