33

How do I get a single random example from a PyTorch DataLoader?

If my DataLoader gives minbatches of multiple images and labels, how do I get a single random image and label?

Note that I don't want a single image and label per minibatch, I want a total of one example.

iacob
  • 20,084
  • 6
  • 92
  • 119
Tom Hale
  • 40,825
  • 36
  • 187
  • 242

7 Answers7

34

If your DataLoader is something like this:

test_loader = DataLoader(image_datasets['val'], batch_size=batch_size, shuffle=True)

it is giving you a batch of size batch_size, and you can pick out a single random example by directly indexing the batch:

for test_images, test_labels in test_loader:  
    sample_image = test_images[0]    # Reshape them according to your needs.
    sample_label = test_labels[0]

Alternative solutions

  1. You can use RandomSampler to obtain random samples.

  2. Use a batch_size of 1 in your DataLoader.

  3. Directly take samples from your DataSet like so:

     mnist_test = datasets.MNIST('../MNIST/', train=False, transform=transform)
    

    Now use this dataset to take samples:

     for image, label in mnist_test:
          # do something with image and other attributes
    
  4. (Probably the best) See here:

     inputs, classes = next(iter(dataloader))   
    
iacob
  • 20,084
  • 6
  • 92
  • 119
parthagar
  • 880
  • 1
  • 7
  • 18
  • Cheers! For Alternative 1, there needs to be a loop... Is it possible to do something like: `testloader.next()[0]`? – Tom Hale Dec 02 '18 at 06:41
  • Take a look at (https://pytorch.org/docs/stable/_modules/torch/utils/data/dataloader.html#DataLoader). I think `DataLoader.__iter__` should work or something from `_DataLoaderIter` must. Also, if this answers your question, consider accepting the answer – parthagar Dec 02 '18 at 08:44
  • 1
    You've not really answered my question: 1) will give one image per minibatch, not just one image. 2) violates the plural in the question (I'll edit to make this move obvious) 3) This looks like it will process every example, not just give a single one. – Tom Hale Dec 02 '18 at 14:42
  • Well, first thing is that you can use DataLoader multiples times in different loops. Now, my main example would take one image out of a minibatch and then you can break the loop or you can use `__iter__` function of DataLoader to get `next` out of it without a loop. For Alternative 1, a `RandomSampler` returns an `iter` object and you can use `next` to pick out a single image, label (https://pytorch.org/docs/stable/_modules/torch/utils/data/sampler.html#RandomSampler). For Alternative 2, yes that violates your specific condition. For Alternative 3, break out of the loop the first time. – parthagar Dec 02 '18 at 16:24
  • 1
    It seems smelly/ugly to make a loop just to break out of it after one iteration. Any ideas? – Tom Hale Dec 04 '18 at 03:32
  • 1
    I also mentioned the `next` option. – parthagar Dec 04 '18 at 09:23
  • You did, but on a RandomSampler, which is not answering the question. It seems I mentioned it first however the syntax is gnarly and I've not yet worked out the implementation (I'm in transit). – Tom Hale Dec 04 '18 at 09:32
  • Could you define your `DataLoader` please? I will try to find something. – parthagar Dec 04 '18 at 09:36
17

If you want to choose specific images from your Trainloader/Testloader, you should check out the Subset function from master:

Here's an example how to use it:

testset = ImageFolderWithPaths(root="path/to/your/Image_Data/Test/", transform=transform)
subset_indices = [0] # select your indices here as a list
subset = torch.utils.data.Subset(testset, subset_indices)
testloader_subset = torch.utils.data.DataLoader(subset, batch_size=1, num_workers=0, shuffle=False)

This way you can use exactly one image and label. However, you can of course use more than just one index in your subset_indices.

If you want to use a specific image from your DataFolder, you can use dataset.sample and build a dictionary to get the index of the image you want to use.

iacob
  • 20,084
  • 6
  • 92
  • 119
lschmidt90
  • 358
  • 3
  • 6
11

(This answer is to supplement Alternative 3 of @parthagar's answer)

Iterating through dataset does not return "random" examples, you should instead use:

# Recovers the original `dataset` from the `dataloader`
dataset = dataloader.dataset

# Get a random sample
random_index = int(numpy.random.random()*len(dataset))
single_example = dataset[random_index]
johnnyasd12
  • 636
  • 7
  • 11
  • Useful answer. If you want to feed this single_example to a network, you may also need to expand the dimension by 1 to add the batch dimension that the dataloader returns by iteration. From my code: item=dataset[frame_number] # get a particular frame voxel = torch.unsqueeze(item['events'], dim=0).to(device)# we need to add a singleton dimension at front to make it look like batch 1 sample – tobi delbruck Apr 20 '23 at 19:56
8

TL;DR:

The general form to get a single example from a DataLoader is:

list = [ x[0] for x in iter(trainloader).next() ]

In particular to the question asked, where minbatches of images and labels are returned:

image, label = [ x[0] for x in iter(trainloader).next() ]

Possibly interesting information:

To get a single minibatch from the DataLoader, use:

iter(trainloader).next()

When running something like for images, labels in dataloader: what happens under the hood is an iterator is created via iter(dataloader), then the iterator's .next() is called on each loop execution.


To get a single image from a DataLoader, which returns images and labels use:

image = iter(trainloader).next()[0][0]

This is the same as doing:

images, labels = iter(trainloader).next()
image = images[0]
Tom Hale
  • 40,825
  • 36
  • 187
  • 242
  • 1
    **Note:** for this to produce a random sample you must ensure `shuffle=True` in the DataLoader construction. – iacob Mar 15 '21 at 14:36
6

Random sample from DataLoader

Assuming DataLoader(shuffle=True) was used in its construction, a single random example can be drawn from the DataLoader with:

example = next(iter(dataloader))[0]

Random sample from Dataset

If that is not the case, you can draw a single random example from the Dataset with:

idx = torch.randint(len(dataset), (1,))
example = dataset[idx]
iacob
  • 20,084
  • 6
  • 92
  • 119
1

The key to get random sample is to set shuffle=True for the DataLoader, and the key for getting the single image is to set the batch size to 1.

Here is the example after loading the mnist dataset.

from torch.utils.data import DataLoader, Dataset, TensorDataset
bs = 1
train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)

for xb, yb in train_dl:
    print(xb.shape)
    x = xb.view(28,28) 
    print(x.shape)
    print(yb)
    break #just once

from matplotlib import pyplot as plt
plt.imshow(x, cmap="gray")

enter image description here

prosti
  • 42,291
  • 14
  • 186
  • 151
0

You can simply just covert trainloader to iterable, then get next batch by writing this code

dataiter = iter(trainloader)
images, labels = next(dataiter)

Here is an example

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))

Refernces: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

Mohamed Fathallah
  • 1,274
  • 1
  • 15
  • 17