I have a file with a size of more than 7 GB, which contains almost 70 million lines. I want to read the file line by line, convert each line to a list, append that list to a previously defined list and finally save that list to a text file. Here is what I have written:
corpus = []
for line in open('file.txt'):
new = line.strip()
new = word_punctuation_tokenizer.tokenize(new)
corpus.append(new)
import pickle
with open("newfile.txt", "wb") as fp: #Pickling
pickle.dump(corpus, fp)
However, the list seems to get very large and after having read about 5 million lines, the program stops responding What should I do?