I have gone through the other threads where its specified that in LDA the memory is proportional to numberOfTerms*numberOfTopics . In my case I have two datasets. In dataset A I have 250K Documents and around 500K terms here I am easily able to run for ~ 500 Topics. But in dataset B I have around 2 Million documents and 500K terms(we got here after some filtering) but here I am only able to run till 50 topics above that it throws memory exception.
So just wanted to understand if only number of terms and topics matter for memory why number of documents is causing this problem and is there any quick workaround which can avoid this.
Note : I know corpus can be wrapped around as an iteratable as specified in memory-efficient-lda-training-using-gensim-library but lets assume I already loaded the corpora in memory because of some other restrictions I have of keeping input data in different format so it can be run on different platforms for different algorithms. The point is I am able to run it for some lesser number of Topics after loading whole corpora in memory. So is there any workaround which can help it run for more number of topics. For example I was thinking adjusting chunksize might help but that didn't work.