batch_size isn't match with output_size.size(0)

Question

I am trying to use SelectKBest, mutual_info_classif with CNN model on cifar10 data.

In the for loop,after the model.eval() in the below code, the outputs of model gives (3136,10) tensor size but batch size is 64.

This is the code I run on colab.

 import torch
from torchvision import datasets, transforms
import torch.nn as nn
from torch.utils.data import DataLoader
from sklearn.feature_selection import SelectKBest, mutual_info_classif
import numpy as np

# Define the data transform
data_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load the CIFAR-10 dataset
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=data_transform)
val_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=data_transform)

# Create the data loaders for the training and validation sets
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False)

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64*8*8, 128)
        self.fc2 = nn.Linear(128, 10)
    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 64*8*8)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Initialize the CNN model
model = CNN()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize an empty list to store the features
features = []

model = model.to(device)
# Extract features from the dataset
model.eval()
with torch.no_grad():
    for x, y in train_loader:
        x = x.to(device)
        output = model(x)
        print(output.size())
        output = output.view(output.size(0), -1)
        features.append(output.cpu().numpy())

# Concatenate the features
features = np.concatenate(features)

# Select the k best features using mutual information
selector = SelectKBest(mutual_info_classif, k=1000)
selected_features = selector.fit_transform(features, train_dataset.targets)

Why is this happening?

Your model is confusing to me. Why are you applying `x.view(-1, 64*8*8)`? This operation mixes the features across samples within the batch and I don't think that is right. Shouldn't it be `x.view(64,-1)`? i.e. flatten across channels, height and width *per sample*. — adrianop01, Jan 20 '23 at 20:45
I don't know how `view`work in background but I think it orders the features and when reshaping, it gets the values in order. I am not sure so any help appreciated. Also same usage is in here https://wandb.ai/ayush-thakur/dl-question-bank/reports/An-Introduction-To-The-PyTorch-View-Function--VmlldzoyMDM0Nzg — Alican Kartal, Jan 20 '23 at 23:28
Please refer to the official tutorials and see how they use `x.view`: https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html — adrianop01, Jan 21 '23 at 00:02
I am afraid that my answer can't be understood if you don't understand the basic architecture (I don't really want to start my explanation from the basics). I would strongly suggest you read the following lecture notes from Cornell (start from p54, they have excellent graphics) to understand how those operations (2d Conv, pooling, fully connected) works and thus the relations between the input/output Tensor dimensions: http://www.cs.cornell.edu/courses/cs5670/2021sp/lectures/lec21_cnns_for_web.pdf — adrianop01, Jan 21 '23 at 00:24
Thank you for the document that I didn't come across. However, I am familiar these topics already. I think on the problem and I see your point. If the tensor of out from second `x = self.pool(torch.relu(self.conv2(x)))` size bigger that `64*8*8` as flatten, in my case images are `224*224*3` output of it will be `56*56*3`. In that case `x.view(-1,64*8*8)` reshape the tensor of `64` batch to `(7*7*3, 64*8*8)`. So you are right. — Alican Kartal, Jan 21 '23 at 15:13

score 0 · Answer 1 · answered Jan 20 '23 at 19:04

The size of the output tensor from the model, (3136, 10), suggests that the number of features being extracted from the dataset is 3136, and there are 10 possible classes for the dataset. This is likely the result of the reshaping and flattening of the output from the last convolutional layer in the forward pass of the CNN model, before being passed through the fully connected layers.

In the for loop, the output is being transformed to have size (batch_size, -1) before appending to the features list. This flattens the output to have only 2 dimensions, with the first dimension representing the number of examples in the batch, and the second dimension representing the number of features. Since the batch size is 64, this is likely why the output size is (3136, 10) instead of (64, 10).

The mutual_info_classif function is then used to select the k best features, where k is set to 1000 in this case, from the concatenated features.

Could you explain which usage of function make the model to return `3136` instead `64` (batch size)? And is this number calculated by summing the parameters or something? — Alican Kartal, Jan 20 '23 at 19:15
`x.view(-1,64*8*8)` in the model should return `(batch_size,4096)` and I moved up `print` function right after `output = model(x)` and it's still giving `(3136,10)`. — Alican Kartal, Jan 20 '23 at 19:20

batch_size isn't match with output_size.size(0)

1 Answers1