Questions tagged [pytorch-datapipe]

7 questions
1
vote
1 answer

How to add custom labels to a torchdata datapipe?

I am trying to load image data for model training from a self-hosted S3 storage (MinIO). Pytorch provides new datapipes with this functionality in the torchdata library. So within my function to create the datapipe, I have these lines: dp_s3 =…
Roland Deschain
  • 2,211
  • 19
  • 50
1
vote
2 answers

How to handle Pytorch Dataset with transform function that returns >1 output per row of data?

Given a myfile.csv file that looks like: imagefile,label train/0/16585.png,0 train/0/56789.png,0 The goal is to create a Pytorch DataLoader that when looped return 2x the data points, e.g. >>> dp = MyDataPipe(csvfile) >>> for row in…
alvas
  • 115,346
  • 109
  • 446
  • 738
0
votes
1 answer

Repeat batched elements in-epoch during training

I am training a (siamense) neural network with Pytorch on a very big dataset. Loading data is the biggest bottleneck, and my dataset doesn't fit in RAM to speed it up. What I would like to do is basically cache part of the data, and repeat it inside…
rmeertens
  • 4,383
  • 3
  • 17
  • 42
0
votes
1 answer

Exception: Unable to add DataPipe function name sharding_filter as it is already taken

torchdata.datapipes is not working in Google Colab. Even after installing the torchdata library, it raises an exception when datapipes function are imported. I installed the dependencies !pip install torchdata or !pip install --pre torchdata -f…
SSK
  • 11
  • 4
0
votes
0 answers

Control the `__getitem__` in custom dataset class based on sampling vector

I have custom datasets that have the __getitem__ method in them. I created the following DatasetMUX class that it supposed to select random dataset and get the item from that dataset: class MUXDataset(Dataset): """ Defines a dataset…
0
votes
0 answers

How to create DataPipe that best optimize the map-transform function that supports batching?

Given a transformation function that I can't change, e.g. autobot_vectorize. It takes in a list of N inputs and output a tensor of N x 3 dimensions. def autobot_vectorize(imgfiles): # This vectorizer takes N imgfiles and return the # a…
alvas
  • 115,346
  • 109
  • 446
  • 738