0

I am trying to learn python and deep learning recently. My teacher sent me a pkl file which contains the data I need. The size of pkl file is 9.6GB. My memory is only 16g. When I try to load the whole file with pickle.load(open('data.pkl', 'rb')), my computer crashed:(

And then, I try to use buffer to load the pkl file, my computer crashed again :( below is the code of buffer:

import pickle
import gc
block_size = 512 * 1024 * 1024 # 512Mb
data = b''
count_num = 0
with open('../data.pkl', 'rb') as f:
    while True:
        buffer = f.read(block_size)
        if not buffer:
            break;
        count_num += 1
        data += buffer
        print("read" + str(count_num*512) + "Mb")
        gc.collect()
print("finish")

After that, I try to Split large files into small files, but I can't load the split small files because of UnpicklingError: pickle data was truncated and UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified. below is the code of splitting:

import pickle
import gc
block_size = 10 * 1024 * 1024
count_num = 0
with open('../data.pkl', 'rb') as f:
    while True:
        buffer = f.read(block_size)
        if not buffer:
            break;
        count_num += 1
        print("read" + str(count_num) + "0Mb")
        fw = open("data/wiki-data-statement-"+str(count_num)+".pkl", "wb")
        pickle.dump(buffer, fw)
        print("split"+ str(count_num) + "block")
        gc.collect()
print("finish")

I need some kind suggestions that how I can solve this problem? Any suggestions about other tools which can perform this task, will be appreciable. Thanks

grepgrok
  • 130
  • 10
  • Have you tried using google colab? – QuestionHaver99 Apr 29 '23 at 03:10
  • 1
    You can't split a pickle. The format doesn't permit that. What operating system are you on? If you have a 64-bit Python, this should be possible, even with only 16GB of RAM. Having said that, pickle is a stupid way to distribute large data sets, in part because of this issue. It's all or nothing. – Tim Roberts Apr 29 '23 at 03:16
  • This question might have some insight: https://stackoverflow.com/questions/26394768/pickle-file-too-large-to-load – mipadi Apr 29 '23 at 04:12
  • I have solved this problem. At last, I rent a server for one hour which has 128g memory help me split the large file into small files. Thanks all you guys above :) – coding rookie May 19 '23 at 10:25

0 Answers0