0

Transformers can handle variable length input, but what if the number of words might correlate with the target? Let's say we want to perform a sentiment analysis for some reviews where the longer reviews are more probable to be bad. How can the model harness this knowledge? Of course a simple solution could be to add this count as a feature after the self-attention layer. However, this hand-crafted-like approach wouldn't reveal more complex relations, for example if there is a high number of word X, it correlates with target 1, except if there is also high number of word Y, in which case the target tends to be 0.

How could this information be included using deep learning? Paper recommendations in the topic are also well appreciated.

tusker
  • 57
  • 5

0 Answers0