6

i'm working on face recognition project using pytorch and mtcnn and after trained my training dataset , now i want to make prediction on test data set

this my trained code

optimizer = optim.Adam(resnet.parameters(), lr=0.001)
scheduler = MultiStepLR(optimizer, [5, 10])

trans = transforms.Compose([
   np.float32,
   transforms.ToTensor(),
   fixed_image_standardization
])
dataset = datasets.ImageFolder(data_dir, transform=trans)
img_inds = np.arange(len(dataset))
np.random.shuffle(img_inds)
train_inds = img_inds[:int(0.8 * len(img_inds))]
val_inds = img_inds[int(0.8 * len(img_inds)):]

train_loader = DataLoader(
   dataset,
   num_workers=workers,
   batch_size=batch_size,
   sampler=SubsetRandomSampler(train_inds)
)
val_loader = DataLoader(
   dataset,
   shuffle=True,
   num_workers=workers,
   batch_size=batch_size,
   sampler=SubsetRandomSampler(val_inds)
)

and if remove sampler=SubsetRandomSampler(val_inds) and put val_inds instead it will rise this error

val_inds ^ SyntaxError: positional argument follows keyword argument

i want to make prediction (select randomly from test data set) in pytorch?thats why i should use shuffle=True i followed this repo facenet-pytorch

art_cs
  • 683
  • 1
  • 8
  • 17

3 Answers3

10

TLDR; Remove shuffle=True in this case as SubsetRandomSampler shuffles data already.

What torch.utils.data.SubsetRandomSampler does (please consult documentation when in doubt) is it will take a list of indices and return their permutation.

In your case you have indices corresponding to training (those are indices of elements in training Dataset) and validation.

Let's assume those look like that:

train_indices = [0, 2, 3, 4, 5, 6, 9, 10, 12, 13, 15]
val_indices = [1, 7, 8, 11, 14]

During each pass SubsetRandomSampler will return one number from those lists at random and those will be randomized again after all of them were returned (__iter__ will be called again).

So SubsetRandomSampler might return something like this for val_indices (analogously for train_indices):

val_indices = [1, 8, 11, 7, 14]  # Epoch 1
val_indices = [11, 7, 8, 14, 1]  # Epoch 2
val_indices = [7, 1, 14, 8, 11]  # Epoch 3

Now each of those numbers are an index to your original dataset. Please note validation is shuffled this way and so is train without using shuffle=True. Those indices do not overlap so data is splitted correctly.

Additional info

  • shuffle uses torch.utils.data.RandomSampler under the hood if shuffle=True is specified, see source code. This in turn is equivalent to using torch.utils.data.SubsetRandomSampler with all indices (np.arange(len(datatest))) specified.
  • you don't have to pre-shuffle np.random.shuffle(img_inds) as indices will be shuffled during each pass anyway
  • don't use numpy if torch provides the same functionality. There is torch.arange, mixing both libraries is almost never necessary.

Inference

Single image

Just pass it through your network an get output, e.g.:

module.eval()
with torch.no_grad():
    output = module(dataset[5380])

First line puts model in evaluation mode (changes behaviour of some layer), context manager turns off gradient (as it's not needed for predictions). Those are almost always used when "checking neural network output".

Checking validation dataset

Something along those lines, notice the same ideas applied as for single image:

module.eval()

total_batches = 0
batch_accuracy = 0
for images, labels in val_loader:
    total_batches += 1
    with torch.no_grad():
        output = module(images)
        # In case it outputs logits without activation
        # If it outputs activation you may have to use argmax or > 0.5 for binary case
        # Item gets float from torch.tensor
        batch_accuracy += torch.mean(labels == (output > 0.0)).item()

print("Overall accuracy: {}".format(batch_accuracy / total_batches))

Other cases

Please see some beginners guides or tutorials and understand those concepts as StackOverflow is not a place to re-do this work (rather concrete and small questions), thanks.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
  • great explained , but how to display test images to make prediction , i have to complete my project to my final year project in this month – art_cs Apr 05 '20 at 14:04
  • @art_cs do you want prediction on single image, make predictions on validation part of data and get some metric (e.g. `accuracy`), display it somehow using `tensorboard` or what exactly? – Szymon Maszke Apr 05 '20 at 14:09
  • yes , thats what i looked for , but to achieve it! im very new in pytorch and deep learning – art_cs Apr 05 '20 at 14:15
  • 1
    @art_cs See my edit, but most definitely start with with some beginners guide to get up to speed and understand pytorch related concepts. Don't jump onto face recognition out of the blue as it will take even more time than understanding the basics (which are presented in the links I gave you quite straightforwardly). – Szymon Maszke Apr 05 '20 at 14:31
  • thank you so much , is there a tutorial on face recognition with pytorch , i searched alot , but i didnt find – art_cs Apr 05 '20 at 14:55
  • 1
    @art_cs most neural networks tasks follow the same principle (training and validation with metrics calculations, model saving, loading etc.) so there is no such thing as "face recognition" specifically, rather it's a bit different application of the concepts I listed there. There is something [here](https://towardsdatascience.com/face-detection-on-custom-dataset-with-detectron2-and-pytorch-using-python-23c17e99e162) but it might be better for you to understand what you are doing beforehand. – Szymon Maszke Apr 05 '20 at 14:58
  • 1
    thank you so much i will try , the problem is i've not enough time to take a course in this time its too late for my last presentation , thanks so much again – art_cs Apr 05 '20 at 15:10
  • sorry , but how can we use real time face recognition depend on the trained model (resnet) which includes trained faces ?thanks for your answering – art_cs Apr 07 '20 at 10:19
  • How to use `SubsetRandomSampler` but without "shuffle = True". I need to use `SubsetRandomSampler` for the valid set, so I don't want to shuffle the data, i.e., I want "shuffle=False" . Any suggestions? – Chau Pham Dec 16 '22 at 22:58
1

You can use Dataloader with shuffle = True but only when sampler = False With this flag samples from dataset will be selected randomly (doc).

Edit1

I agree with @SzymonMaszke : with SubsetRandomSampler there no need to use shuffle, because your data already picked randomly.

Anton Ganichev
  • 2,184
  • 1
  • 18
  • 17
  • the error is `ValueError: sampler option is mutually exclusive with shuffle` , and i will update the question – art_cs Apr 05 '20 at 12:53
  • my code the same as the repo (facenet-pytorch/examples/finetune.ipynb) – art_cs Apr 05 '20 at 13:29
  • 1
    @AntonGanichev of course you don't have to use `shuffle=True`, `SubsetRandomSampler` already shuffles (permutes to be exact) data already, hence `shuffle=True` cannot be specified simultaneuosly. – Szymon Maszke Apr 05 '20 at 13:45
0

I'm not sure what format you test data is in but to select a sample randomly from your dataset, you can use random.choice from the module random.

Athreya Daniel
  • 101
  • 1
  • 10