I am currently working with 9600 documents and applying gensim LDA. For training part, the process seems to take forever to get the model. I've tried to use multicore function as well, but it seems not working. I ran whole almost 3-days and I still can not get the lda model. I've checked some features of my data and the codes. I read this question gensim LdaMulticore not multiprocessing?, but still don't get the solutions.
corpora.MmCorpus.serialize('corpus_whole.mm', corpus)
corpus = gensim.corpora.MmCorpus('corpus_whole.mm')
dictionary = gensim.corpora.Dictionary.load('dictionary_whole.dict')
dictionary.num_pos
12796870
print(corpus)
MmCorpus(5275227 documents, 44 features, 11446976 non-zero entries)
# lda model training codes
lda = models.LdaModel(corpus, num_topics=45, id2word=dictionary,\
update_every=5, chunksize=10000, passes=100)
ldanulti = models.LdaMulticore(corpus, num_topics=45, id2word=dictionary,\
chunksize=10000, passes=100, workers=3)
This is my config to check BLAS, which I am not sure I installed proper one. One thing I struggled here is, I can not use the command apt-get to install packages on my mac. I've installed Xcode but it still gives me an error.
python -c 'import scipy; scipy.show_config()'
lapack_mkl_info:
libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/Users/misun/anaconda/lib']
include_dirs = ['/Users/misun/anaconda/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
lapack_opt_info:
libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/Users/misun/anaconda/lib']
include_dirs = ['/Users/misun/anaconda/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/Users/misun/anaconda/lib']
include_dirs = ['/Users/misun/anaconda/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
blas_mkl_info:
libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/Users/misun/anaconda/lib']
include_dirs = ['/Users/misun/anaconda/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
I have poor understanding on how to use shardedcorpus in python with my dictionary and corpora, so any helps will be appreciated! I haven't slept for 3 days to figure this problem!! Thanks!!