Skip to content

gh-78502: Add a trackfd parameter to mmap.mmap() #25425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions Doc/library/mmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ update the underlying file.

To map anonymous memory, -1 should be passed as the fileno along with the length.

.. class:: mmap(fileno, length, tagname=None, access=ACCESS_DEFAULT[, offset])
.. class:: mmap(fileno, length, tagname=None, access=ACCESS_DEFAULT, offset=0)

**(Windows version)** Maps *length* bytes from the file specified by the
file handle *fileno*, and creates a mmap object. If *length* is larger
Expand All @@ -71,7 +71,8 @@ To map anonymous memory, -1 should be passed as the fileno along with the length

.. audit-event:: mmap.__new__ fileno,length,access,offset mmap.mmap

.. class:: mmap(fileno, length, flags=MAP_SHARED, prot=PROT_WRITE|PROT_READ, access=ACCESS_DEFAULT[, offset])
.. class:: mmap(fileno, length, flags=MAP_SHARED, prot=PROT_WRITE|PROT_READ, \
access=ACCESS_DEFAULT, offset=0, *, trackfd=True)
:noindex:

**(Unix version)** Maps *length* bytes from the file specified by the file
Expand Down Expand Up @@ -102,10 +103,20 @@ To map anonymous memory, -1 should be passed as the fileno along with the length
defaults to 0. *offset* must be a multiple of :const:`ALLOCATIONGRANULARITY`
which is equal to :const:`PAGESIZE` on Unix systems.

If *trackfd* is ``False``, the file descriptor specified by *fileno* will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have some idea of why I might want to use this parameter. Right now it only describes the downsides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, the internally duplicated handle probably references an open that lacks delete access. It thus prevents deleting the file, even if the mapped section otherwise allows it (e.g. the section is mapped readonly). For example:

>>> f = open('spam.txt')
>>> m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
>>> f.close()
>>> os.remove('spam.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'spam.txt'

>>> # I manually closed the internal handle via Process Explorer.
>>> os.remove('spam.txt')
>>> m[:]
b'spam'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a reason to at least add the argument for all platforms, which I'm generally in favour of anyway. It can have more appropriate semantics on Windows if needed (i.e. "doesn't hold an extra HANDLE" rather than "FD").

It's probably actually pretty useful to be able to immediately delete the file but keep the mapping open (which will keep the file on disk on Windows at least, so you can't reuse the name while it's in use). And it looks like the mapping doesn't lock out deletes, so I guess it'll work as intended.

I'm not going to hold up this PR for it though. All I'll say is that if we ever do add that option, it should be trackfd=False to "activate" it, for consistency between platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a reason to at least add the argument for all platforms, which I'm generally in favour of anyway. It can have more appropriate semantics on Windows if needed (i.e. "doesn't hold an extra HANDLE" rather than "FD").

I think trackfd would be fine on Windows. The fileno parameter is a C file descriptor, not a native OS handle.

It's probably actually pretty useful to be able to immediately delete the file but keep the mapping open (which will keep the file on disk on Windows at least, so you can't reuse the name while it's in use). And it looks like the mapping doesn't lock out deletes, so I guess it'll work as intended.

NTFS supports POSIX delete, in which a deleted file gets renamed to a reserved system directory until all references to the file object have been closed. That includes the internal pointer reference to a file object that's held by the memory manager for the mapped section. The internal file reference doesn't count toward the file's share mode, i.e. a memory-mapped file can be deleted even if the source open didn't share delete access. Actually, I just checked that the delete is allowed nowadays even if the mapped section has write access to the file, so my assumption was wrong that it would only work for a readonly mapping.

You can observe this in Process Explorer. Switch the lower-pane view to DLLs (file- and pagefile-backed memory mappings), and add the name and path columns to the view. You'll see that the backing file gets moved to the "\$Extend\$Deleted" system directory on the volume after the file is 'deleted'.

not be duplicated, and the resulting :class:`!mmap` object will not
be associated with the map's underlying file.
This means that the :meth:`~mmap.mmap.size` and :meth:`~mmap.mmap.resize`
methods will fail.
This mode is useful to limit the number of open file descriptors.

To ensure validity of the created memory mapping the file specified
by the descriptor *fileno* is internally automatically synchronized
with the physical backing store on macOS.

.. versionchanged:: 3.13
The *trackfd* parameter was added.

This example shows a simple way of using :class:`~mmap.mmap`::

import mmap
Expand Down Expand Up @@ -254,9 +265,12 @@ To map anonymous memory, -1 should be passed as the fileno along with the length

.. method:: resize(newsize)

Resizes the map and the underlying file, if any. If the mmap was created
with :const:`ACCESS_READ` or :const:`ACCESS_COPY`, resizing the map will
raise a :exc:`TypeError` exception.
Resizes the map and the underlying file, if any.

Resizing a map created with *access* of :const:`ACCESS_READ` or
:const:`ACCESS_COPY`, will raise a :exc:`TypeError` exception.
Resizing a map created with with *trackfd* set to ``False``,
will raise a :exc:`ValueError` exception.

**On Windows**: Resizing the map will raise an :exc:`OSError` if there are other
maps against the same named file. Resizing an anonymous map (ie against the
Expand Down
3 changes: 3 additions & 0 deletions Doc/whatsnew/3.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,9 @@ mmap
that can be used where it requires a file-like object with seekable and
the :meth:`~mmap.mmap.seek` method return the new absolute position.
(Contributed by Donghee Na and Sylvie Liberman in :gh:`111835`.)
* :class:`mmap.mmap` now has a *trackfd* parameter on Unix; if it is ``False``,
the file descriptor specified by *fileno* will not be duplicated.
(Contributed by Zackery Spytz and Petr Viktorin in :gh:`78502`.)

opcode
------
Expand Down
57 changes: 57 additions & 0 deletions Lib/test/test_mmap.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from test.support.import_helper import import_module
from test.support.os_helper import TESTFN, unlink
import unittest
import errno
import os
import re
import itertools
Expand Down Expand Up @@ -266,6 +267,62 @@ def test_access_parameter(self):
self.assertRaises(TypeError, m.write_byte, 0)
m.close()

@unittest.skipIf(os.name == 'nt', 'trackfd not present on Windows')
def test_trackfd_parameter(self):
size = 64
with open(TESTFN, "wb") as f:
f.write(b"a"*size)
for close_original_fd in True, False:
with self.subTest(close_original_fd=close_original_fd):
with open(TESTFN, "r+b") as f:
with mmap.mmap(f.fileno(), size, trackfd=False) as m:
if close_original_fd:
f.close()
self.assertEqual(len(m), size)
with self.assertRaises(OSError) as err_cm:
m.size()
self.assertEqual(err_cm.exception.errno, errno.EBADF)
with self.assertRaises(ValueError):
m.resize(size * 2)
with self.assertRaises(ValueError):
m.resize(size // 2)
self.assertEqual(m.closed, False)

# Smoke-test other API
m.write_byte(ord('X'))
m[2] = ord('Y')
m.flush()
with open(TESTFN, "rb") as f:
self.assertEqual(f.read(4), b'XaYa')
self.assertEqual(m.tell(), 1)
m.seek(0)
self.assertEqual(m.tell(), 0)
self.assertEqual(m.read_byte(), ord('X'))

self.assertEqual(m.closed, True)
self.assertEqual(os.stat(TESTFN).st_size, size)

@unittest.skipIf(os.name == 'nt', 'trackfd not present on Windows')
def test_trackfd_neg1(self):
size = 64
with mmap.mmap(-1, size, trackfd=False) as m:
with self.assertRaises(OSError):
m.size()
with self.assertRaises(ValueError):
m.resize(size // 2)
self.assertEqual(len(m), size)
m[0] = ord('a')
assert m[0] == ord('a')

@unittest.skipIf(os.name != 'nt', 'trackfd only fails on Windows')
def test_no_trackfd_parameter_on_windows(self):
# 'trackffd' is an invalid keyword argument for this function
size = 64
with self.assertRaises(TypeError):
mmap.mmap(-1, size, trackfd=True)
with self.assertRaises(TypeError):
mmap.mmap(-1, size, trackfd=False)

def test_bad_file_desc(self):
# Try opening a bad file descriptor...
self.assertRaises(OSError, mmap.mmap, -2, 4096)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
:class:`mmap.mmap` now has a *trackfd* parameter on Unix; if it is
``False``, the file descriptor specified by *fileno* will not be duplicated.
26 changes: 20 additions & 6 deletions Modules/mmapmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ typedef struct {

#ifdef UNIX
int fd;
_Bool trackfd;
#endif

PyObject *weakreflist;
Expand Down Expand Up @@ -397,6 +398,13 @@ is_resizeable(mmap_object *self)
"mmap can't resize with extant buffers exported.");
return 0;
}
#ifdef UNIX
if (!self->trackfd) {
PyErr_SetString(PyExc_ValueError,
"mmap can't resize with trackfd=False.");
return 0;
}
#endif
if ((self->access == ACCESS_WRITE) || (self->access == ACCESS_DEFAULT))
return 1;
PyErr_Format(PyExc_TypeError,
Expand Down Expand Up @@ -1158,7 +1166,7 @@ is 0, the maximum length of the map is the current size of the file,\n\
except that if the file is empty Windows raises an exception (you cannot\n\
create an empty mapping on Windows).\n\
\n\
Unix: mmap(fileno, length[, flags[, prot[, access[, offset]]]])\n\
Unix: mmap(fileno, length[, flags[, prot[, access[, offset[, trackfd]]]]])\n\
\n\
Maps length bytes from the file specified by the file descriptor fileno,\n\
and returns a mmap object. If length is 0, the maximum length of the map\n\
Expand Down Expand Up @@ -1225,15 +1233,17 @@ new_mmap_object(PyTypeObject *type, PyObject *args, PyObject *kwdict)
off_t offset = 0;
int fd, flags = MAP_SHARED, prot = PROT_WRITE | PROT_READ;
int devzero = -1;
int access = (int)ACCESS_DEFAULT;
int access = (int)ACCESS_DEFAULT, trackfd = 1;
static char *keywords[] = {"fileno", "length",
"flags", "prot",
"access", "offset", NULL};
"access", "offset", "trackfd", NULL};

if (!PyArg_ParseTupleAndKeywords(args, kwdict, "in|iii" _Py_PARSE_OFF_T, keywords,
if (!PyArg_ParseTupleAndKeywords(args, kwdict,
"in|iii" _Py_PARSE_OFF_T "$p", keywords,
&fd, &map_size, &flags, &prot,
&access, &offset))
&access, &offset, &trackfd)) {
return NULL;
}
if (map_size < 0) {
PyErr_SetString(PyExc_OverflowError,
"memory mapped length must be positive");
Expand Down Expand Up @@ -1329,6 +1339,7 @@ new_mmap_object(PyTypeObject *type, PyObject *args, PyObject *kwdict)
m_obj->weakreflist = NULL;
m_obj->exports = 0;
m_obj->offset = offset;
m_obj->trackfd = trackfd;
if (fd == -1) {
m_obj->fd = -1;
/* Assume the caller wants to map anonymous memory.
Expand All @@ -1354,13 +1365,16 @@ new_mmap_object(PyTypeObject *type, PyObject *args, PyObject *kwdict)
}
#endif
}
else {
else if (trackfd) {
m_obj->fd = _Py_dup(fd);
if (m_obj->fd == -1) {
Py_DECREF(m_obj);
return NULL;
}
}
else {
m_obj->fd = -1;
}

Py_BEGIN_ALLOW_THREADS
m_obj->data = mmap(NULL, map_size, prot, flags, fd, offset);
Expand Down