In Spark 2.0.1 (pyspark), I want to learn an LDA with the online optimizer. Does this version of the optimizer makes possible the update of the model each day (for example)? I'm not sure I understand the meaning of online here and its implications. Does it mean that:
A) I have to load the entire corpus and the model will learn by mini-batches (and because of that, maybe be faster than its EM counterpart).
B) I can submit to the learner a fraction of the corpus and get a first model and subsequently submit another fraction and get an upgraded version of the first model.
Thanks for clarifying
EDIT: to be specific, what I do is:
from pyspark.ml.clustering import LDA
lda = LDA(k=nclusters, seed=1, optimizer="online")
ldaModel = lda.fit(mydf.select([mydf["id"],mydf["features"]]))
With my ldaModel fitted, can I upgrade it with new df? It should be the case in my opinion since the online optimizer does essentially that, sampling the corpus at each iteration and upgrade the model against a subset of it, doesn't it?