2

I'm building a model using keras in order to learn word embeddings using a skipgram with negative sampling. My input is pair of words: (context_word, target_word) and of course the label 1 for positives and 0 for negative couples. What I need to do is add bias to the model. The bias should be only the bias of the target word for each input and not for both words.

Until now I have the code:

input_u = Input((1,))
input_v = Input((1,))

item_embedding = Embedding(input_dim = items_size, 
                           output_dim = embed_dim,                           
                           name = 'item_embedding')

bias_embedding = Embedding(input_dim = items_size, 
                           output_dim = 1, 
                           embeddings_initializer = 'zeros', 
                           name = 'bias_embedding')

u = item_embedding(input_u)
v = item_embedding(input_v)
b_v = bias_embedding(input_v)

dot_p_layer = merge.dot([u, v], axes = 1)
with_bias = merge.add([dot_p_layer, b_v])
flattenned = Flatten()(with_bias)

output_layer = Dense(1, 
                     activation = 'sigmoid', 
                     use_bias = False)(flattenned)
print (output_layer.shape)

model = Model(inputs=[input_u, input_v], outputs=output_layer)
model.compile(loss='binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

However, I can't seem to get it working. The code is running but I'm getting higher loss and lower accuracy than the model without bias. So I'm thinknig I'm doing something wrong. Plus, when I check the sizes I still get the size of my embedding dimension and not embedding dimension + 1

I thought about using another Dense layer (not even sure if it's logical or right to do) in order to add the bias after the dot product but I couldn't really get it working either.

I really would like some help adding bias into the model.

petezurich
  • 9,280
  • 9
  • 43
  • 57
melowgs
  • 420
  • 1
  • 4
  • 13
  • Are you sure you want to "add" bias? Adding is not supposed to add any extra element to the dimension size.... – Daniel Möller Feb 10 '19 at 14:21
  • @DanielMöller well you are right about the wording, it is more like I want to take the bias into account. I said "add" as it is what I tried in the code but apparently I'm missing something and it's anyways not the right way to do. – melowgs Feb 10 '19 at 14:53
  • @SzymonMaszke I'm not doing exactly word2vec (though it is based on it as you can see) but I'm using it for a recommender system (hence, _item_ _embedding_) so bias might actually make a difference. I understand the bias is still an embedding as there is no _use_ _bias_ option for the embedding layer. What you say about concatenate might actually work and I want to give it a try, though how do I use it? I mean just like the embeddings I have but concatenate it instead of _add_ ? – melowgs Feb 10 '19 at 15:10
  • 1
    If you're thinking of actual bias as they call it in neural network, there isn't such a thing as bias for embeddings. Embeddings are not operations. Biases are "added" (a sum operation) to the result of a previous operation. Adding two embeddings is exactly the same as having a single embedding. Concatenating two embeddings is exactly the same as having a bigger embedding. So there is no reason to want to add or concatenate anything to an embedding. Now, if you're doing something "after the dot" (then you've got an operation), it may make some difference. – Daniel Möller Feb 10 '19 at 15:24
  • @DanielMöller Thanks for the explanation and your last sentence is **exactly** what I'm trying to do. Dot product of two embeddings (embedding for each item of a couple) and **after** that the bias part into account. And as I tried to explain in the original question, I don't even need to use the bias for each item **but** **only** the bias of the target item. My question was exactly about how to do it. I'm thinking a mix of both your answers below might be the solution. I mean first the dot and then concatenate the bias which will add another dimension and then flatten. What do you think? – melowgs Feb 10 '19 at 15:37
  • 1
    @SzymonMaszke, since there is a `Flatten` and a `Dense(1)` after the questionable operation, what happens in the middle will not affect the input and output shapes. – Daniel Möller Feb 10 '19 at 15:37
  • Oh yeah, my bad, apologies for inconvenience. BTW. cleared my comments to remove noise from the question and answer. – Szymon Maszke Feb 10 '19 at 15:39

1 Answers1

3

If you want dimension + 1 you're looking for concatenate, not for add.

I don't know the dimension after dot (dot is weird behavior, lol), but if it's 3D (batch, embedding, embedding), you will need to flatten before the concatenation.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214