Questions tagged [pytorch-datapipe]
7 questions
1
vote
1 answer
How to add custom labels to a torchdata datapipe?
I am trying to load image data for model training from a self-hosted S3 storage (MinIO). Pytorch provides new datapipes with this functionality in the torchdata library.
So within my function to create the datapipe, I have these lines:
dp_s3 =…

Roland Deschain
- 2,211
- 19
- 50
1
vote
2 answers
How to handle Pytorch Dataset with transform function that returns >1 output per row of data?
Given a myfile.csv file that looks like:
imagefile,label
train/0/16585.png,0
train/0/56789.png,0
The goal is to create a Pytorch DataLoader that when looped return 2x the data points, e.g.
>>> dp = MyDataPipe(csvfile)
>>> for row in…

alvas
- 115,346
- 109
- 446
- 738
0
votes
1 answer
Repeat batched elements in-epoch during training
I am training a (siamense) neural network with Pytorch on a very big dataset. Loading data is the biggest bottleneck, and my dataset doesn't fit in RAM to speed it up.
What I would like to do is basically cache part of the data, and repeat it inside…

rmeertens
- 4,383
- 3
- 17
- 42
0
votes
1 answer
Exception: Unable to add DataPipe function name sharding_filter as it is already taken
torchdata.datapipes is not working in Google Colab.
Even after installing the torchdata library, it raises an exception when datapipes function are imported.
I installed the dependencies
!pip install torchdata
or
!pip install --pre torchdata -f…

SSK
- 11
- 4
0
votes
0 answers
Control the `__getitem__` in custom dataset class based on sampling vector
I have custom datasets that have the __getitem__ method in them.
I created the following DatasetMUX class that it supposed to select random dataset and get the item from that dataset:
class MUXDataset(Dataset):
"""
Defines a dataset…

David
- 83
- 7
0
votes
0 answers
How to create DataPipe that best optimize the map-transform function that supports batching?
Given a transformation function that I can't change, e.g. autobot_vectorize. It
takes in a list of N inputs and
output a tensor of N x 3 dimensions.
def autobot_vectorize(imgfiles):
# This vectorizer takes N imgfiles and return the
# a…

alvas
- 115,346
- 109
- 446
- 738