0

I want to train Inceptionv3 model where i am trying to give 3 different view of a single image and train it. So i want to give three images as my input in a single feed.

Use case:

I want to predict type of footwear. In this problem usually a lot of information is present different view so just want to try this approach.

Gaurav Gola
  • 123
  • 1
  • 1
  • 10
  • Why not input each image separately and average the output? – SaiBot Jan 24 '19 at 12:09
  • or label each input as same? it will figure out it belong to same class. But note that cnn are **not** rotation invariant for large angle. So there will be some affect on performance – Ankish Bansal Jan 24 '19 at 12:10
  • @SaiBot Ankish it will be same as training all different images with a single image in a single feed. What i want is to make the different views of a single image in a single feed. – Gaurav Gola Jan 24 '19 at 12:18

2 Answers2

0

The easy way would be to input all 3 images separately into the Inceptionv3 model, and than perform some weighted decision on all 3 outputs together.

A better approach would be to use the Inceptionv3 model as 1 of 3 input branches, than take the embedding layer of each branch (the layer before last) and combine them all with one fully connected classification layer (with softmax activation). The 3 branches can be trained either view-specific or together with shared weights (with such a big model, together will work fine).

By the way, for shoe type classification task I would suggest to use a simpler model (Inceptionv3 is an overkill).

Mark.F
  • 1,624
  • 4
  • 17
  • 28
  • can you explain this : The easy way would be to input all 3 images separately into the Inceptionv3 model, and than perform some weighted decision on all 3 outputs together. It looks similar to regular training. Feeding a single image at a time – Gaurav Gola Jan 24 '19 at 12:24
  • The training is similar, but during the test you classify a single show based on the output from 3 different views. – Mark.F Jan 24 '19 at 12:39
  • So what i understood is : I will label all different views with their respective classes and train the model. So how can i make some weighted decision on all 3 outputs together – Gaurav Gola Jan 24 '19 at 12:47
  • Get output for 3 images. Than either calculate the mean of the 3, or if you know which view to prefer (and by how much) calculate a weighted average. Or use it as input to a simpler classifier model which can learn the weights for each view. – Mark.F Jan 24 '19 at 14:55
0

I think you have different ways of acting:

  • Remove the first layer of inception and create yours to support 3x3 dimensions.
  • Use the first inception blocks for each input, then concatenate them in some fc layer (or before). If the features to search are similar you can use shared parameters.

The first case will merge all dimensions and difuse the information provided for any image. The second one will extract specific features in each image.

Adria Ciurana
  • 904
  • 1
  • 9
  • 19
  • I know its been years, but I am struggling to understand how to use one block for three inputs separately. Is there any reference for the same? I have a similar problem with 10 images altogether, and I want to process them separately instead of merging them, as mentioned in point 1. – iamkk Feb 25 '21 at 15:13
  • Yes, you can do it easily using reshape, the next pseudocode is in pytorch format. Initially you should have the data in Nx10xCxHxW format. To pass everything through the first model you will do, `x = x.reshape(-1, C, H, W)`. To put it together: `output = model(x); output.reshape(N, 10, -1, H, W)` Finally to put each part together you can simply do an avg: `output_mean = output.mean(dim=1)` – Adria Ciurana Mar 08 '21 at 15:39