Creating custom dataset in PyTorch

Question

Problem

In PyTorch, I am trying to write a class that could return the entire data and label separately using syntax like dataset.data and dataset.label. The code skeleton looks like:

class MyDataset(object):
  data = _get_data()
  label = _get_label()
  def __init__(self, dir, transforms):
    self.img_list = ... # all image paths loaded from dir
    # do something 

  def __getitem__(self):
    # do something
    return data, label

  def __len__(self):
    return len(self.img_list)

  def _get_data():
    # do something

  def _get_label():
    # do something

However, when I use dataset.data and dataset.label to access the corresponding variables, nothing is returned.

I am wondering why this is the case and how I can fix this.

Edit

Thank you for all of your attention.

I have solved this problem by myself. The solution is pretty straightforward, which just utilizes the property of class variables.

class FaceDataset(object):
    # class variable
    data = None
    label = None

    def __init__(self, root, transforms=None):
        # read img_list from root
        img_list = ...
        self.transforms = ...
        FaceDataset.data = FaceDataset._get_data(self.img_list, self.transforms)
        FaceDataset.label = FaceDataset._get_label(self.img_list)

    @classmethod
    def _get_data(cls, img_list, transforms):
        data_list = []
        for img_path in img_list:
            data_list.append(transforms(Image.open(img_path)).unsqueeze(0))
        return torch.stack(data_list, dim=0)

    @classmethod
    def _get_label(cls, img_list):
        label = torch.zeros(len(img_list))
        for i, img_path in enumerate(img_list):
            label[i] = ...
        return label

    def __getitem__(self, index):
        img_path = self.img_list[index]
        label = ...

        # read image from file
        data = Image.open(img_path)
        # apply transform defined in __init__
        data = self.transforms(data)

        return data, label

    def __len__(self):
        return len(self.img_list)

This cannot be answered as we can't simply guess what's in `_get_data()` and `_get_label()`. Moreover, in PyTorch you should always be subclassing the [Dataset](https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html) class for your custom datasets. — Mat, Oct 24 '19 at 16:25
@Mat They return the pixel values of an image and the corresponding label for the image (in my case, whether there is a human face in the image). I was not aware I could directly subclass `torch.utils.data.Dataset`, current I just created an iterable (say, `mydataset`) and then create dataset using syntax like `dataloader = torch.utils.data.DataLoader(mydataset)`. Thank you for pointing that out. — Mr.Robot, Oct 24 '19 at 17:56
A few things here: firstly, if you find an answer to your question, you should **post** that answer and not edit it in the question. Secondly, I voted to close this as unclear; this is not how one should define a custom `Dataset` and therefore guessing behavior is not doable. — Mat, Oct 24 '19 at 18:00
Possible duplicate of [PyTorch: How to use DataLoaders for custom Datasets](https://stackoverflow.com/questions/41924453/pytorch-how-to-use-dataloaders-for-custom-datasets) — Blupon, Oct 25 '19 at 09:12

Blupon · Answer 1 · 2019-10-26T17:37:34.433

1

The "normal" way to create custom datasets in Python has already been answered here on SO. There happens to be an official PyTorch tutorial for this.

For a simple example, you can read the PyTorch MNIST dataset code here (this dataset is used in this PyTorch example code for further illustration). Finally, you can find other dataset implementations in this torchvision datasets list (click on the dataset name, then on the "source" button in the dataset documentation, to access the dataset's PyTorch implementation).

edited Oct 26 '19 at 17:37

answered Oct 25 '19 at 09:17

Blupon

959
1
12
16

Thank you for the pointers. I did in the "abnormal" way because of reading a pytorch book published 2 years ago, which is clearly outdated. – Mr.Robot Oct 26 '19 at 00:05
1

2 years is certainly old when it comes to ML libraries ! To avoid wasting energy on well documented PyTorch uses, I advise you to have a look at the PyTorch tutorials [page](https://pytorch.org/tutorials/). Each tutorial is linked using its title on the left side of the page, the second tutorial in this list is the one you're interested in regarding this question for example. Enjoy ! – Blupon Oct 26 '19 at 17:48

Creating custom dataset in PyTorch

Problem

Edit

1 Answers1