Closed
Description
Bug report
When trying to write a large amount of data (2.5 GB uncompressed, 250 MB compressed) with the gzip library, the CRC WRITTEN seems to be off. Smaller sizes of data did work fine. When reading the file with gzip.open it throws a gzip.BadGzipFile, with any other program, it basically says the file is corrupt. When circumventing the CRC check, the file unzips fine.
import ejson
import gzip
# users.json is about 2.5 GB
with open('users.json', 'r', encoding='utf-8') as file:
contents = ejson.loads(file.read())
# the resulting file is about 250 MB big which seems right and decompresses fine when suppressing the CRC check
with gzip.open('users_compressed.json.gz', 'w') as file:
file.write(ejson.dumps(contents).encode('utf-8'))
opening the newly written File '', I get the following: gzip.BadGzipFile: CRC check failed.
Trying to only put in half the data with the following seems to work, too:
import ejson
import gzip
with open('users.json', 'r', encoding='utf-8') as file:
contents = ejson.loads(file.read())
# Produces a file with about 250 MB that has a bad CRC
with gzip.open('users_compressed.json.gz', 'w') as file:
file.write(ejson.dumps(dict(list(contents.items()))).encode('utf-8'))
# Produces a file with about 125 MB that opens fine
with gzip.open('users_compressed_1.json.gz', 'w') as file:
file.write(ejson.dumps(dict(list(contents.items())[len(contents)//2:])).encode('utf-8'))
# Produces a file with about 125 MB that opens fine
with gzip.open('users_compressed_2.json.gz', 'w') as file:
file.write(ejson.dumps(dict(list(contents.items())[:len(contents)//2])).encode('utf-8'))
environment
Python 3.10.8 on a M1 mac (macos 12.6)
Not sure how to debug it further at this point. Has anyone had similar problems?
Metadata
Metadata
Assignees
Projects
Status
Done