I've worked with decompressing and reading files on the fly in memory with the bz2
library. However, i've read through the documentation and can't seem to just simply decompress the file to create a brand new file on the file system with the decompressed data without memory storage. Sure, you could read line by line using BZ2Decompressor then write that to a file, but that would be insanely slow. (Decompressing massive files, 50GB+). Is there some method or library I have overlooked to achieve the same functionality as the terminal command bz2 -d myfile.ext.bz2
in python without using a hacky solution involving a subprocess to call that terminal command?
Example why bz2 is so slow:
Decompressing that file via bz2 -d: 104seconds
Analytics on a decompressed file(just involves reading line by line): 183seconds
with open(file_src) as x:
for l in x:
Decompressing on the file and using analytics: Over 600 seconds (This time should be max 104+183)
if file_src.endswith(".bz2"):
bz_file = bz2.BZ2File(file_src)
for l in bz_file: