2

I want to classify 2 types of sentences: statements and questions. For this I need already learned word2vec NN to pass sentences throw it and receive 2d array for each sentence, e.g.:

[[~300 items], [~300 items], [~300 items], ...]

"300" is approximated length of word vector.

how to do that is keras? what library is better to use?

Dmytro Nalyvaiko
  • 1,664
  • 4
  • 16
  • 27

1 Answers1

1

What I adivce you is to use an Embedding layer and set its weights:

input = Input(shape=(seq_len,))
embedding = Embedding(input_dim=vocabulary_size, 
    output_dim=300, weights=[your_w2v_matrix])(input)
...

Here you could find a really similiar question.

Community
  • 1
  • 1
Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • 1
    that quesion's answer refers to post http://ben.bolte.cc/blog/2016/keras-gensim-embeddings.html . but this post is incomprehensible – Dmytro Nalyvaiko Mar 12 '17 at 14:08
  • `Embedding` layer which was shown in my answer should also solve your problem. – Marcin Możejko Mar 12 '17 at 14:11
  • can you show part of code, which transforms sentences to arrays of word2vec vectors? – Dmytro Nalyvaiko Mar 12 '17 at 14:16
  • Sure if you specify the format of your sentences and w2v matrix. – Marcin Możejko Mar 12 '17 at 15:12
  • I don't have any format. simply, there are 2 arrays. first array contain statements, second - questions. I don't have even any train sample yet. so this is "abstract" sentences and in fact it is doesn't matter what in format it should be, I guess. and about w2v matrix. I don't actually know what is it. for me w2v transformation is black box trasformation. so, can you help me in this case? – Dmytro Nalyvaiko Mar 12 '17 at 17:33
  • In what format do these quesitions and statments are? Are they `numpy.array`s? Or lists of strings? – Marcin Możejko Mar 12 '17 at 19:44
  • it is not important, but let's suppose that it is lists of strings, eg: statements = ["some st 1", "some st 2", "some st 3", ...] questions = ["some qw 1", "some qw 2", "some qw 3", ...] – Dmytro Nalyvaiko Mar 12 '17 at 20:22
  • I have the impression that the format of `keras.layers.Embedding` with `weights` is deprecated if you check this (https://keras.io/layers/embeddings/) and this (https://github.com/tensorflow/tensorflow/issues/14392) – Outcast Oct 04 '18 at 17:07