I have 2 lists of sentences. First list contains different questions, second contains different statements.
Little example:
1st list:
[
"What are cultures that are associated with core values?",
"How do bumblebees fly?",
"It is possible that avocado seed can be a tea?",
...
]
2nd list:
[
"The population was 388 at the 2010 census.",
"Grevillea helmsiae is a tree which is endemic to Queensland in Australia.",
"He played youth football for Tynecastle Boys Club.",
...
]
I want to write program which will be able to classify this 2 types of sentences. For this, I can create neural network and train it on my 2 lists. I guess, this must be recurrent neural network.
I have transformed each sentence to array of word2vec vectors. And now I want to set up keras recurrent neural network with LSTM layers. But I don't know how to do that correctly. Can you write keras model for this problem?
UPDATE
the form of above sentences after transforming it by word2vec is like this:
[
[vector_of_floats_for_word_"what", vector_of_floats_for_word_"are", vector_of_floats_for_word_"cultures", vector_of_floats_for_word_"that", ...],
[vector_of_floats_for_word_"how", vector_of_floats_for_word_"do", vector_of_floats_for_word_"bumblebees", ...]
]
and so on. each vector has 300 dimensions.
here is my model:
X = []
Y = []
for i in range(1000):
X.append(questions_vectors[i])
Y.append([1, 0])
X.append(statements_vectors[i])
Y.append([0, 1])
model = Sequential()
model.add(LSTM(128, input_shape=(2000, None, 300)))
model.add(Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer=RMSprop(lr=0.01))
there you can see magic numbers 2000 and 300. 2000 is 1000 questions + 1000 statements, 300 - word vector length
but I'm sure that my model is wrong. also I'm getting the error:
ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4