I'm dealing with a rather large text dataset (5.4 million short texts) and I'm trying to perform sentiment analysis con them on 16GB of ram.
I keep running out of memory whenever I try to build the language model:
data_lm = text_data_from_csv(DATASET_PATH, data_func=lm_data, chunksize=4000)
# Out of memory
data_clas = text_data_from_csv(DATASET_PATH, data_func=classifier_data, vocab=data_lm.train_ds.vocab, chunksize=500)
I've played around with the chunksize but the memory usage seems to keep rising over time and eventually results in a memory error.
Is there any way to work around this?