I am using Gensim's LDAMulticore to perform LDA. I have around 28M small documents (around 100 characters each).
I have given workers argument to be 20 but the top shows it using only 4 processes. There are some discussions around it that it might be slow in reading corpus like: gensim LdaMulticore not multiprocessing? https://github.com/piskvorky/gensim/issues/288
But both of them uses MmCorpus . Although my corpus is completely in memory. I have machine with very large RAM (250 GB) and loading the corpus in memory takes around 40 GB. But even after that LDAMulticore is using just 4 processes. I created the corpus as:
corpus = [dictionary.doc2bow(text) for text in texts]
I am not able to understand what can be the limiting factor here?