I will try to explain my problem as clearly as possible.
So, I am trying to learn product information of reviews using GRU. I have about a million reviews, all of them converted to 300 dimensional vectors. These review vectors are grouped by their individual product. Now, I have a training set of all those million reviews where reviews belonging to same product will be one after another.
Below is the my GRU model
X_train = numpy.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
Y_train = np_utils.to_categorical(encoded_Y_train,nb_classes=3)
Y_train = numpy.reshape(Y_train, (Y_train.shape[0], 1, Y_train.shape[1]))
model = Sequential()
model.add(GRU(input_dim=doc_vec_dim,output_dim=product_embedding_dim,name='GRU',dropout_W=input_dropout,return_sequences=True))
model.add(TimeDistributed(Dense(input_dim=product_embedding_dim,output_dim=3)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adagrad', class_mode="categorical",metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15, validation_split=test_sample,shuffle=False,callbacks=[LoggingCallback(logger)])
This GRU is expected to capture product information such as particular product most likely gets positive reviews or viceversa. But after retrieving 128 dimensional output vector of all unique products using following code:
gru_layer_model = Model(input=gru_model.input,
output=gru_model.get_layer('GRU').output)
layer_output = gru_layer_model.predict(X_predict)
where X_predict is 300 dimensional vector of a unique product, I am not getting any accuracy improvement when I concatenate this to its original vector and classify using SVM.
Here, the time-step is 1. Is that the problem? If so, how to approach this problem