I was running gensim
LdaMulticore
package for the topic modelling using Python.
I tried to understand the meaning of the parameters within LdaMulticore
and found the website that provides some explanations on the usage of the parameters. As a non-expert, I have some difficulty understanding these intuitively. I also referred some other materials from the website but I guess this page gives relatively full explanations of every parameters.
This page
chunksize
Number of documents to be used in each training chunk.
->Does it mean that it determines how many documents to be analyzed (trained) at once?
Does changing thechunksize
number generate significantly different outcomes? or does it just matter to the running time?
2.alpha
, eta
, decay
->I kept reading the explanations but couldn't understand these at all.
Could someone give me some intuitive explanations on what these are about/when do I need to adjust these?
3.iteration
Maximum number of iterations through the corpus when inferring the topic distribution of a corpus.
->It seems that Python goes over n times of the entire corpus when I set it to n. So the higher the number, the more data is analyzed but takes longer time.
4.random state
Either a randomState
object or a seed to generate one. Useful for reproducibility.
->I've seen people setting up this by putting a random number. But what is random state about?