attn_output_weights in MultiheadAttention

Question

I wanna know if the matrix of the attn_output_weight can demonstrate the relationship between every word-pair in the input sequence. In my project, I draw the heat map based on this output and it shows like this:

However, I can hardly see any information from this heat map. I refer to other people's work, their heat map is like this. At least the diagonal of the matrix should have the deep color.

Then I wonder if my method to draw the heat map is correct or not (i.e. directly using the output of the attn_output_weight ) If this is not the correct way, could you please tell me how to draw the heat map?

score 3 · Accepted Answer · answered Apr 27 '21 at 06:07

It seems your range of values is rather limited. In the target example the range of values lies between [0, 1], since each row represents the softmax distribution. This is visible from the definition of attention:

I suggest you normalize each row / column (according to the attention implementation you are using) and finally visualize the attention maps in the range [0, 1]. You can do this using the arguments vmin and vmax respectively in matplotlib plottings.

If this doesn't solve the problem, maybe add a snippet of code containing the model you are using and the visualization script.

attn_output_weights in MultiheadAttention

1 Answers1