3

I wanna know if the matrix of the attn_output_weight can demonstrate the relationship between every word-pair in the input sequence. In my project, I draw the heat map based on this output and it shows like this: enter image description here

However, I can hardly see any information from this heat map. I refer to other people's work, their heat map is like this. At least the diagonal of the matrix should have the deep color. enter image description here

Then I wonder if my method to draw the heat map is correct or not (i.e. directly using the output of the attn_output_weight ) If this is not the correct way, could you please tell me how to draw the heat map?

Yuki Wang
  • 85
  • 8

1 Answers1

3

It seems your range of values is rather limited. In the target example the range of values lies between [0, 1], since each row represents the softmax distribution. This is visible from the definition of attention:

enter image description here

I suggest you normalize each row / column (according to the attention implementation you are using) and finally visualize the attention maps in the range [0, 1]. You can do this using the arguments vmin and vmax respectively in matplotlib plottings.

If this doesn't solve the problem, maybe add a snippet of code containing the model you are using and the visualization script.

Shir
  • 1,571
  • 2
  • 9
  • 27