deeplearning4j: cannot use an existing Word2Vec dutchembeddings

Question

I tried to use the dutchembeddings in Word2Vec format with dl4j. But an exception is thrown when loadStaticModel is called: "Unable to guess input file format"

WordVectorSerializer.loadStaticModel(new File(WORD_VECTORS_PATH)

https://github.com/clips/dutchembeddings (I downloaded the wikipedia 160 tar.gz)

How can I get the dutchembeddings in Word2Vec format working with dl4j?

Stacktrace

Loading word vectors and creating DataSetIterators
o.d.m.e.l.WordVectorSerializer - Trying DL4j format...
o.d.m.e.l.WordVectorSerializer - Trying CSVReader...
o.d.m.e.l.WordVectorSerializer - Trying BinaryReader...
Exception in thread "main" java.lang.RuntimeException: Unable to guess input file format
    at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.loadStaticModel(WordVectorSerializer.java:2646)
    at org.deeplearning4j.examples.convolution.sentenceclassification.CnnDutchSentenceClassification.main(CnnDutchSentenceClassification.java:122)

Process finished with exit code 1

Could you post a complete stack trace? What format is this model in? — Adam Gibson, Oct 09 '17 at 02:56
Hi Adam, I added the stacktrace. The format according to [the dutchembeddings site](https://github.com/clips/dutchembeddings) : The embeddings are currently provided in .txt files which contain vectors in word2vec format, which is structured as follows: The first line contains the size of the vectors and the vocabulary size, separated by a space. Ex: 320 50000 Each line thereafter contains the vector data for a single word, and is presented as a string delimited by spaces. Ex: hond 0.2 -0.542 0.253 etc. — Johan Vogelzang, Oct 09 '17 at 19:29

deeplearning4j: cannot use an existing Word2Vec dutchembeddings

0 Answers0