As title says, I'm downloading a bz2 file which has a folder inside and a lot of text files...
My first version was decompressing in memory, but Although it is only 90mbs when you uncomrpess it, it has 60 files of 750mb each.... Computer goes bum! obviusly cant handle like 40gb of ram XD)
So, The problem is that they are too big to keep all in memory at the same time... so I'm using this code that works but its sucks (Too slow):
response = requests.get('https:/fooweb.com/barfile.bz2')
# save file into disk:
compress_filepath = '{0}/files/sources/{1}'.format(zsets.BASE_DIR, check_time)
with open(compress_filepath, 'wb') as local_file:
local_file.write(response.content)
#We extract the files into folder
extract_folder = compress_filepath + '_ext'
with tarfile.open(compress_filepath, "r:bz2") as tar:
tar.extractall(extract_folder)
# We process one file at a time:
for filename in os.listdir(extract_folder):
filepath = '{0}/{1}'.format(extract_folder,filename)
file = open(filepath, 'r').readlines()
for line in file:
some_processing(line)
Is there a way I could make this without dumping it to disk... and only decompressing and reading one file from the .bz2 at a time?
Thank you very much for your time in advance, I hope somebody knows how to help me with this...