1

I am using a Swin Transformer for a hierarchical problem of multi calss multi label classification. I would like to visualize the self attention maps on my input image trying to extract them from the model, unfortunately I am not succeeding in this task. Could you give me a hint on how to do it? I share you the part of the code in which I am trying to do this task.

attention_maps = []
for module in model.modules():
    #print(module)
    if hasattr(module,'attention_patches'):  #controlla se la variabile ha l' attributo
        print(module.attention_patches.shape)
        if module.attention_patches.numel() == 224*224:
            attention_maps.append(module.attention_patches)
for attention_map in attention_maps:
    attention_map = attention_map.reshape(224, 224, 1)
    plt.imshow(sample['image'].permute(1, 2, 0), interpolation='nearest')
    plt.imshow(attention_map, alpha=0.7, cmap=plt.cm.Greys)
    plt.show()
``

In addition if you know about some explainability techniques, like Grad-CAM, which could be used with a hierarchical Swin Transformer, feel free to attach a link, it would be very helpful for me.  

1 Answers1

1

I am also researching the same, while I don't have anything specific to SWIN. Here are some resources related to Vision Transformers. I hope it helps:

https://jacobgil.github.io/deeplearning/vision-transformer-explainability https://github.com/jacobgil/vit-explain

https://github.com/hila-chefer/Transformer-Explainability

https://github.com/hila-chefer/Transformer-Explainability/blob/main/Transformer_explainability.ipynb

Aakash Gupta
  • 716
  • 6
  • 11