I read the following blog post and tried to implement it via Keras: https://andriymulyar.com/blog/bert-document-classification
Now, Im quite new to Keras and I do not understand how to use "seq2seq neural networks" to condens a sequence of subchunks (sentences) into a global context vector (document vector). - via LSTM..
For example: I have 10 documents consisting of 100 sentences each and each sentence is represented by a 1x500 vector. So the array would look like this:
X = np.array(Matrix).reshape(10, 100, 500) # reshape to 10 documents with 100 sequence of 500 features
So I know I want to train my network and take the last hidden-layer cause this one represents my document vector/global context vector.
However, the hardest part for me is to imagine the output vector.. do I just enumerate my documents
y = [1,2,3,4,5,6,7,8,9,10]
y = np.array(y)
or do I have to use one-hot-encoded output vectors:
yy = to_categorical(y)
or even something else..?
As far as I understand, the final model should look something like this:
model = Sequential()
model.add(LSTM(50, input_shape=(100,500)))
model.add(Dense(1))
model.compile(loss='categorical_crossentropy',optimizer='rmsprop')
model.fit(X, yy, epochs=100, validation_split=0.2, verbose=1)