0

I found GoogleNews-vectors-negative300.bin library, but only for ENG words, Is there any Polish implementation for similar words in word2vec?

I have already tried using cc.pl.300.bin and NKJP-PodkorpusMilionowy libraries...

    public  Word2Vec getWord2Vec() {
        File gModel = new File("C:/Users/user/Desktop/GoogleNews-vectors-negative300.bin.gz");
        return WordVectorSerializer.readWord2VecModel(gModel);
    }
TobiSH
  • 2,833
  • 3
  • 23
  • 33
Fakinoo
  • 49
  • 1
  • 8
  • Just to clarify: You are using DL4J, right? You can od course always try to train your own model :-) – TobiSH Nov 08 '19 at 13:52
  • yes, but i'm looking for something ready :D – Fakinoo Nov 08 '19 at 13:54
  • your "yes" refer to the DL4J question? – TobiSH Nov 08 '19 at 13:55
  • Maybe the people at https://datascience.stackexchange.com/ can help you. – TobiSH Nov 08 '19 at 13:57
  • The problem you hit with `cc.pl.300.bin` may have been that the 1st file I can find reference to with that name originated via FastText training, whose native format has data than a plain word-vector list. But, the same page providing that file provides 'text' format files, ending `.vec`, that might work better for your library's reading function. See: https://fasttext.cc/docs/en/pretrained-vectors.html – gojomo Nov 08 '19 at 14:04

1 Answers1

1

The file...

https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pl.vec

...as linked from...

https://fasttext.cc/docs/en/pretrained-vectors.html

...may work for you, if your library loads the simple 'text' format for exchanging word-vectors. (It's not in the Facebook FastText-specific binary format, as your cc.pl.300.bin file was.)

gojomo
  • 52,260
  • 14
  • 86
  • 115