in order to fine-tune a pre-trained transformer model, should I remove the coefficients of the fc layers that is after each attention head?

Question

for fine-tune a pre-trained transformer model, in addition to the coefficients of the head layer (head.weight, head.bias), should I remove the coefficients of the fc layers that is after each attention head?

this is some element of my pre-trained model(Vision transformer(Vit)):

_orig_mod.cls_token
_orig_mod.pos_embed
_orig_mod.patch_embed.proj.weight
_orig_mod.patch_embed.proj.bias
_orig_mod.blocks.0.norm1.weight
_orig_mod.blocks.0.norm1.bias
_orig_mod.blocks.0.attn.qkv.weight
_orig_mod.blocks.0.attn.qkv.bias
_orig_mod.blocks.0.attn.proj.weight
_orig_mod.blocks.0.attn.proj.bias
_orig_mod.blocks.0.norm2.weight
_orig_mod.blocks.0.norm2.bias
_orig_mod.blocks.0.mlp.fc1.weight
_orig_mod.blocks.0.mlp.fc1.bias
_orig_mod.blocks.0.mlp.fc2.weight
_orig_mod.blocks.0.mlp.fc2.bias
...
...
...
_orig_mod.blocks.9.norm1.weight
_orig_mod.blocks.9.norm1.bias
_orig_mod.blocks.9.attn.qkv.weight
_orig_mod.blocks.9.attn.qkv.bias
_orig_mod.blocks.9.attn.proj.weight
_orig_mod.blocks.9.attn.proj.bias
_orig_mod.blocks.9.norm2.weight
_orig_mod.blocks.9.norm2.bias
_orig_mod.blocks.9.mlp.fc1.weight
_orig_mod.blocks.9.mlp.fc1.bias
_orig_mod.blocks.9.mlp.fc2.weight
_orig_mod.blocks.9.mlp.fc2.bias
_orig_mod.blocks.10.norm1.weight
_orig_mod.blocks.10.norm1.bias
_orig_mod.blocks.10.attn.qkv.weight
_orig_mod.blocks.10.attn.qkv.bias
_orig_mod.blocks.10.attn.proj.weight
_orig_mod.blocks.10.attn.proj.bias
_orig_mod.blocks.10.norm2.weight
_orig_mod.blocks.10.norm2.bias
_orig_mod.blocks.10.mlp.fc1.weight
_orig_mod.blocks.10.mlp.fc1.bias
_orig_mod.blocks.10.mlp.fc2.weight
_orig_mod.blocks.10.mlp.fc2.bias
_orig_mod.blocks.11.norm1.weight
_orig_mod.blocks.11.norm1.bias
_orig_mod.blocks.11.attn.qkv.weight
_orig_mod.blocks.11.attn.qkv.bias
_orig_mod.blocks.11.attn.proj.weight
_orig_mod.blocks.11.attn.proj.bias
_orig_mod.blocks.11.norm2.weight
_orig_mod.blocks.11.norm2.bias
_orig_mod.blocks.11.mlp.fc1.weight
_orig_mod.blocks.11.mlp.fc1.bias
_orig_mod.blocks.11.mlp.fc2.weight
_orig_mod.blocks.11.mlp.fc2.bias
_orig_mod.norm.weight
_orig_mod.norm.bias
_orig_mod.head.weight
_orig_mod.head.bias

in order to fine-tune a pre-trained transformer model, should I remove the coefficients of the fc layers that is after each attention head?

0 Answers0

Linked