Highest Voted 'vision-transformer' Questions

2

votes

0 answers

Positional embedding for larger images fed to ViT

Pre-trained ViT (Vision Transformer) models are usually trained on 224x224 images or 384x384 images. But I have to fine-tune a custom ViT model (all the layers of ViT plus some additional layers) on 640x640 images. How to handle the positional…

asked Apr 23 '23 at 18:41

Preetom Saha Arko

2,588
4
21
37

1

vote

1 answer

ViT model reconstruction confusion while trying to insert layers into the old model

I met a problem while trying to reconstruct a model from the old one by replicating layer by layer. The problem is the dimension of the output tensor of the reconstructed model(new) is not the same as the original one(old): new:[4,196,10]…

python machine-learning pytorch vision-transformer

asked Jul 11 '23 at 09:18

PikovO

11
1

1

vote

0 answers

Training Vision Transformer on Custom dataset

i am trying to use a pre-trained ViT pytorch model. It is pre-trained on imagenet with image size 384x384. Now i want to fine tune this model on my own dataset. But each time when i load the pre-trained ViT model and try to fine tune it i got an…

deep-learning computer-vision huggingface-transformers image-classification vision-transformer

asked May 15 '23 at 19:11

Waqar Ahmad

11
1

1

vote

0 answers

how do vision transformers deal with input images with different sizes?

I want to train a vision transformer with progressive learning which is used in EffientNetV2. Is there any way to do this in a transformer model?

computer-vision transformer-model vision-transformer

asked Oct 12 '21 at 12:07

LSC

11
1

0

votes

0 answers

TypeError: 'KerasTensor' object is not callable while using VisionTransformerModel0

While trying to use VisionTransformerModel0 after splitting the datasets, I'm getting the following error TypeError Traceback (most recent call last) Cell In[15], line 3 1 from VisionTransformer import ViT ---->…

tensorflow keras image-processing tensor vision-transformer

asked Aug 29 '23 at 06:21

ASUVATH M S

1

0

votes

0 answers

First diverging operator message for vit_b_16 and torch.utils.tensorboard

I can't get torch.utils.tensorboard to write_graph for vit_b_16 model. Here is example code: import torch from torchvision.models import get_model from torch.utils import tensorboard # create example inputs fake_images = torch.ones((32, 3, 224,…

python pytorch tensorboard torchvision vision-transformer

asked Aug 08 '23 at 03:34

user3731622

4,844
8
45
84

0

votes

0 answers

model.parameters() vs model.state_dict() - which one gives the correct number of parameters in Pytorch?

I have created a modified version of ViT-base by coding from scratch. This version contains all the layers of the vision transformer, plus some additional layers. The number of parameters of a model can be found using this function: def…

pytorch computer-vision vision-transformer

asked Jul 21 '23 at 20:27

Preetom Saha Arko

2,588
4
21
37

0

votes

1 answer

Loading model with a custom layer error Tensorflow 2.6.2

I have the following custom layer in my Vision Transformer class DataAugmentation(Layer): def __init__(self, norm, SIZE): super(DataAugmentation, self).__init__() self.norm = norm self.SIZE = SIZE self.resize = Resizing(SIZE, SIZE) …

python tensorflow keras vision-transformer

asked Jul 05 '23 at 11:49

mad

2,677
8
35
78

0

votes

0 answers

How to implement src_key_padding_mask in a vision transformer

I am implementing a modified vision transformer based on the Github implementation. The author has also published a YouTube video explaining the implementation. But this implementation doesn't have any provision to incorporate src_key_padding_mask.…

python deep-learning pytorch transformer-model vision-transformer

asked Jun 15 '23 at 05:10

Preetom Saha Arko

2,588
4
21
37

0

votes

1 answer

Getting error in model summary and extracting feature for Vision Transformer model

I am writing a code for vision transformers for image feature extraction. I had defined a ViT model from this github site. image_model = ViT( image_size=224, patch_size=32, num_classes=1000, dim=1024, depth=6, heads=16, …

python tensorflow deep-learning computer-vision vision-transformer

asked May 01 '23 at 15:40

Ritul Chavda

173
1
9

0

votes

0 answers

How to Visualize each patches and combine back to an image in Vision transformer features maps

Running vision transformer keras code but Trying to visualize variable "features" encoded patch image is stored? How to visualize each patches? def create_vit_classifier(): inputs = layers.Input(shape=input_shape) # Augment data. …

visualization patch vision-transformer

asked Mar 31 '23 at 12:05

Abi

1
1

0

votes

0 answers

Unpatch data of vision transformer

I have a patch_tensor with the shape: torch.Size([2, 77, 256]) and I want to Unpatchify this to (N,H,W,C) or (N,C,H,W). The original shape of image is (2,4,64,64). For patch embedding, I am using the PatchEmbed from timm library: hidden_size = 36 /…

python pytorch transformer-model vision-transformer

asked Mar 27 '23 at 11:14

Jessica

1
1

0

votes

1 answer

Understanding Vision Transformer Implementation in Keras: Issues with Patch Shape and Embedding Layer

I'm trying to understand this implementation of vision transformers in keras. Here is the full code. I can't understand why patches = tf.reshape(patches, [batch_size, -1, patch_dims]) is returning a tensor (batch_size,num_patches,patch_dim)…

python tensorflow keras transformer-model vision-transformer

asked Mar 13 '23 at 12:12

Matteo Silla

23
3

0

votes

1 answer

Vision transformer: Visualize feature maps

I am working on visualizing feature maps of my vision transformer but i am unable to visualize feature maps. When i print model.children() it shows convolution layers but still i cannot verify the if…

python-3.x torch feature-extraction vision-transformer

asked Jul 05 '22 at 07:43

Khawar Islam

2,556
2
34
56

0

votes

1 answer

Vision Transformer models in vit-keras

I have used vit_b32 and vit_b16 models in vit-keras. What are the other available models and their input image sizes in vit-keras? Are there any combined models (ResNet + Vit) available? Thanks

keras model vision-transformer

asked Nov 09 '21 at 18:32

RavihariL

1

Questions tagged [vision-transformer]