I'm brand new to Julia so forgive any ignorance.
I'm hoping to be able to use Doc2Vec from Python's gensim module in Julia. However, i am running up against an issue where the names from the TaggedDocument python object are not surviving the automatic conversion when being assigned to a variable in Julia.
This seems to be a known issue, but not really one where i can clearly see how to implement the solution. https://github.com/JuliaPy/PyCall.jl/issues/175
# import modules
gensim = pyimport("gensim")
Doc2Vec = pyimport("gensim.models.doc2vec")
# create simple "taggedDocuments"
a = Doc2Vec.TaggedDocument(words=gensim.utils.simple_preprocess("This is some text"), tags = ["tag01"])
# setup the Doc2Vec model
model = Doc2Vec.Doc2Vec(size= 100, min_count = 1, dm = 1)
# use the "taggedDocument" to populate the vocab attributes.
model.build_vocab(a)
# This results in - AttributeError("'tuple' object has no attribute 'words'")
# one idea i had was to try to re-add the names to the julia object
tnames = (:words, :tags);
c = (;zip(tnames, a)...)
# However when these get passed back into python the names are lost again
model.build_vocab(c)
# and again - AttributeError("'tuple' object has no attribute 'words'")
My current assumption is that if i can force the outputs from the Doc2Vec.TaggedDocument()
to not be automatically converted and to be stored as a PyObject then the names shouldn't be lost. To me this seems like something simple as part of PyCall but reading the Types section here: https://github.com/JuliaPy/PyCall.jl hasn't helped. So wondering if anyone had potential solution.
Thanks in advance.