I have 3 word embeddings :
- embedding#1 : [w11, w12, w13, w14]
- embedding#2 : [w21, w22, w23, w24]
- embedding#3 : [w31, w32, w33, w34]
Is there a way to get a fourth embedding by adding all three vectors, with the trainable weights from all of them, like:
- embedding#4 : [w11 + w21 + w31, w12 + w22 + w32, w13 + w23 + w33, w14 + w24 + w34]
? Is there a way to do this in a keras layer?
Problem
I want to learn the word embeddings for Indonesian language. I plan to do this by training a sequence prediction machine using LSTMs.
However, the grammar of Indonesian language is different from english. Especially, in Indonesian, you can modify a word using prefixes and suffixes. A noun word when given a prefix can become a verb, and when given a suffix can become an adjective. You can put so many into one word, so that a single base word can have 5 or more variations.
For example :
- tani means farm (verb)
- pe-tani means farmer
- per-tani-an means farm (noun)
- ber-tani means farm (verb, with slightly different meaning)
The transformation of semantic done by appending a prefix to a word is consistent between words. For example :
- pe-tani is to tani is what pe-layan is to layan, what pe-layar is to layar, what pe-tembak is to tembak, and so on.
- per-main-an is to main is what per-guru-an is to guru, what per-kira-an is to kira, what per-surat-an is to surat, and so on.
Therefore, i plan to represent the prefixes and suffixes as embeddings, which would be used to do an addition to the base word's embedding, producing a new embedding. So the meaning of the composite word is derived from the embeddings of the base word and the affixes, not stored as a separate embeddings. However i don't know how to do this in a Keras layer. If it had been asked before, i cannot find it.