What do input layers represent in a Hierarchical Attention Network

Question

I'm trying to grasp the idea of a Hierarchical Attention Network (HAN), most of the code i find online is more or less similar to the one here: https://medium.com/jatana/report-on-text-classification-using-cnn-rnn-han-f0e887214d5f :

embedding_layer=Embedding(len(word_index)+1,EMBEDDING_DIM,weights=[embedding_matrix],
input_length=MAX_SENT_LENGTH,trainable=True)
sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32', name='input1')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(LSTM(100))(embedded_sequences)
sentEncoder = Model(sentence_input, l_lstm)

review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32',  name='input2')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(LSTM(100))(review_encoder)
preds = Dense(len(macronum), activation='softmax')(l_lstm_sent)
model = Model(review_input, preds)

My question is: What do the input layers here represent? I'm guessing that input1 represents the sentences wrapped with the embedding layer, but in that case what is input2? Is it the output of the sentEncoder? In that case it should be a float, or if it's another layer of embedded words, then it should be wrapped with an embedding layer as well.

score 1 · Accepted Answer · answered Apr 06 '19 at 12:12

1

The HAN model processes the text in a hierarchy: it takes a document already splitted into sentences (that's why the shape of input2 is (MAX_SENTS,MAX_SENT_LENGTH)); then it processes each sentence independently using sentEncoder model (that's why the shape of input1 is (MAX_SENT_LENGTH,)), and finally it processes all the encoded sentences together.

So in your code the whole model is stored in model and its input layer is input2 which you would fed with documents which have been splitted into sentences and their words have been integer encoded (to make it compatible with the embedding layer). The other input layer belongs to the sentEncoder model which is used inside the model (and not directly by you):

review_encoder = TimeDistributed(sentEncoder)(review_input)

answered Apr 06 '19 at 12:12

today

32,602
8
95
115

I understand that, but what form are words inputted at? If they are integer representations then the code is completely ignoring the embedding isn't it? – amrnablus Apr 06 '19 at 15:55
@amrnablus Why do think that the code is ignoring it if the input is integer encoded? The embedding layer needs integer encoded representation which are actually the index of the words in a dictionary. – today Apr 06 '19 at 17:07
My understanding is that the words represented as integers need to be wrapped with am embedding layer for the words to be converted to vectors, which is indeed the case for input1 where it gets wrapped in line #4. For input2, the Input layer is passed directly the model which i understand would cause the model to treat words as plain integers rather than embeddings – amrnablus Apr 06 '19 at 17:32
@amrnablus I have already mentioned this in my answer so I expected it must be clear; but to make it clear for yourself in another way try to answer the following question: where do the values fed to the `input2` go? Follow the code line by line and I think then you'll understand it. Or just take a look at the last line of my answer. – today Apr 06 '19 at 19:32

score 1 · Answer 2 · answered Apr 07 '19 at 17:32

Masoud's answer is correct but I'll rewrite it here in my own words:

The data (X_train) is fed as indexes to the model and is received by input2
X_train is then forwarded to the encoder model and is received by input1
input1 is wrapped by an embedding layer so the indexes are converted to vectors

So input2 is more a proxy of the model's input.

What do input layers represent in a Hierarchical Attention Network

2 Answers2