Transformer model used in computer vision tasks, uses attention mechanism
Questions tagged [vision-transformer]
17 questions
2
votes
0 answers
Positional embedding for larger images fed to ViT
Pre-trained ViT (Vision Transformer) models are usually trained on 224x224 images or 384x384 images. But I have to fine-tune a custom ViT model (all the layers of ViT plus some additional layers) on 640x640 images. How to handle the positional…

Preetom Saha Arko
- 2,588
- 4
- 21
- 37
1
vote
1 answer
ViT model reconstruction confusion while trying to insert layers into the old model
I met a problem while trying to reconstruct a model from the old one by replicating layer by layer. The problem is the dimension of the output tensor of the reconstructed model(new) is not the same as the original one(old): new:[4,196,10]…

PikovO
- 11
- 1
1
vote
0 answers
Training Vision Transformer on Custom dataset
i am trying to use a pre-trained ViT pytorch model. It is pre-trained on imagenet with image size 384x384. Now i want to fine tune this model on my own dataset. But each time when i load the pre-trained ViT model and try to fine tune it i got an…

Waqar Ahmad
- 11
- 1
1
vote
0 answers
how do vision transformers deal with input images with different sizes?
I want to train a vision transformer with progressive learning which is used in EffientNetV2. Is there any way to do this in a transformer model?

LSC
- 11
- 1
0
votes
0 answers
TypeError: 'KerasTensor' object is not callable while using VisionTransformerModel0
While trying to use VisionTransformerModel0 after splitting the datasets, I'm getting the following error
TypeError Traceback (most recent call last)
Cell In[15], line 3
1 from VisionTransformer import ViT
---->…
0
votes
0 answers
First diverging operator message for vit_b_16 and torch.utils.tensorboard
I can't get torch.utils.tensorboard to write_graph for vit_b_16 model.
Here is example code:
import torch
from torchvision.models import get_model
from torch.utils import tensorboard
# create example inputs
fake_images = torch.ones((32, 3, 224,…

user3731622
- 4,844
- 8
- 45
- 84
0
votes
0 answers
model.parameters() vs model.state_dict() - which one gives the correct number of parameters in Pytorch?
I have created a modified version of ViT-base by coding from scratch. This version contains all the layers of the vision transformer, plus some additional layers. The number of parameters of a model can be found using this function:
def…

Preetom Saha Arko
- 2,588
- 4
- 21
- 37
0
votes
1 answer
Loading model with a custom layer error Tensorflow 2.6.2
I have the following custom layer in my Vision Transformer
class DataAugmentation(Layer):
def __init__(self, norm, SIZE):
super(DataAugmentation, self).__init__()
self.norm = norm
self.SIZE = SIZE
self.resize = Resizing(SIZE, SIZE)
…

mad
- 2,677
- 8
- 35
- 78
0
votes
0 answers
How to implement src_key_padding_mask in a vision transformer
I am implementing a modified vision transformer based on the Github implementation. The author has also published a YouTube video explaining the implementation. But this implementation doesn't have any provision to incorporate src_key_padding_mask.…

Preetom Saha Arko
- 2,588
- 4
- 21
- 37
0
votes
1 answer
Getting error in model summary and extracting feature for Vision Transformer model
I am writing a code for vision transformers for image feature extraction. I had defined a ViT model from this github site.
image_model = ViT(
image_size=224,
patch_size=32,
num_classes=1000,
dim=1024,
depth=6,
heads=16,
…

Ritul Chavda
- 173
- 1
- 9
0
votes
0 answers
How to Visualize each patches and combine back to an image in Vision transformer features maps
Running vision transformer keras code but Trying to visualize variable "features" encoded patch image is stored? How to visualize each patches?
def create_vit_classifier():
inputs = layers.Input(shape=input_shape)
# Augment data.
…

Abi
- 1
- 1
0
votes
0 answers
Unpatch data of vision transformer
I have a patch_tensor with the shape: torch.Size([2, 77, 256]) and I want to Unpatchify this to (N,H,W,C) or (N,C,H,W). The original shape of image is (2,4,64,64).
For patch embedding, I am using the PatchEmbed from timm library:
hidden_size = 36 /…

Jessica
- 1
- 1
0
votes
1 answer
Understanding Vision Transformer Implementation in Keras: Issues with Patch Shape and Embedding Layer
I'm trying to understand this implementation of vision transformers in keras.
Here is the full code.
I can't understand why patches = tf.reshape(patches, [batch_size, -1, patch_dims]) is returning a tensor (batch_size,num_patches,patch_dim)…

Matteo Silla
- 23
- 3
0
votes
1 answer
Vision transformer: Visualize feature maps
I am working on visualizing feature maps of my vision transformer but i am unable to visualize feature maps. When i print model.children() it shows convolution layers but still i cannot verify the if…

Khawar Islam
- 2,556
- 2
- 34
- 56
0
votes
1 answer
Vision Transformer models in vit-keras
I have used vit_b32 and vit_b16 models in vit-keras. What are the other available models and their input image sizes in vit-keras? Are there any combined models (ResNet + Vit) available?
Thanks