I need to train some glove models to compare them with word2vec and fasttext output. It's implemented in C, and I can't read C code. The github is here.
The training corpus needs to be formatted into a single text file. For me, this would be >>100G -- way too big for memory. Before I waste time constructing such a thing, I'd be grateful if someone could tell me whether the glove algo tries to read the thing into memory, or whether it streams it from disk.
If the former, then glove's current implementation wouldn't be compatible with my data (I think). If the latter, I'd have at it.