0

I will try to explain my problem as clearly as possible.

So, I am trying to learn product information of reviews using GRU. I have about a million reviews, all of them converted to 300 dimensional vectors. These review vectors are grouped by their individual product. Now, I have a training set of all those million reviews where reviews belonging to same product will be one after another.

Below is the my GRU model

    X_train = numpy.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
      Y_train = np_utils.to_categorical(encoded_Y_train,nb_classes=3)
      Y_train = numpy.reshape(Y_train, (Y_train.shape[0], 1, Y_train.shape[1]))

    model = Sequential()
    model.add(GRU(input_dim=doc_vec_dim,output_dim=product_embedding_dim,name='GRU',dropout_W=input_dropout,return_sequences=True))
    model.add(TimeDistributed(Dense(input_dim=product_embedding_dim,output_dim=3)))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adagrad', class_mode="categorical",metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15, validation_split=test_sample,shuffle=False,callbacks=[LoggingCallback(logger)])

This GRU is expected to capture product information such as particular product most likely gets positive reviews or viceversa. But after retrieving 128 dimensional output vector of all unique products using following code:

gru_layer_model = Model(input=gru_model.input,
                                         output=gru_model.get_layer('GRU').output)

layer_output = gru_layer_model.predict(X_predict) 

where X_predict is 300 dimensional vector of a unique product, I am not getting any accuracy improvement when I concatenate this to its original vector and classify using SVM.

Here, the time-step is 1. Is that the problem? If so, how to approach this problem

ntstha
  • 1,187
  • 4
  • 23
  • 41
  • If the time step is one, why are you using recurrent layers ? There is no sequence, no need to use a GRU. the GRU will basically act as a Dense layer... can I ask what are the 300 values representing? What is the interpretation of the values of the input? – Nassim Ben Mar 13 '17 at 22:26
  • @NassimBen Well the 300 dimensional values are the paragraph vector representation of the reviews. So, I need to make a sequence of all reviews of a particular product. What about when the products have variable reviews number? Should I pad them with 0 vectors. I have also tried with batch_size=1 and stateful=true and resetting hidden values after particular product ends, but still I got no improvement – ntstha Mar 13 '17 at 22:45
  • Indeed if you need the reviews to be treated together you can put them in sequences... and pad with vectors of 0's. But doing sentiment analysis on learned paragraph vectors doesnt make sense to me, you didnt train the vectors to contain sentiment features... that info is maybe already lost when you get to this step of the learning – Nassim Ben Mar 14 '17 at 06:04

0 Answers0