Questions tagged [dataloader]

DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.

GitHub: dataloader

430 questions
2
votes
1 answer

PyTorch: `DataLoader()` for aggregated/clustered/panel data

Say I have a data set with multiple observations per individual (also known as panel data). Hence, I want to sample them together; that is to say I want to sample my dataset at the level of the individuals, not at the level of the observations (or…
2
votes
0 answers

Pytorch dataloader : how to deal efficiently with a dataset larger than the Ram

I have a large number of numpy files that surpass the size of the RAM. I create a Dataloaer that read the files using memmap (solution from Load multiple .npy files (size > 10GB) in pytorch). Surprisingly I have memory issues with while loading the…
Simon Madec
  • 101
  • 1
  • 8
2
votes
0 answers

Pytorch dataloaders : OSError: [Errno 9] Bad file descriptor

Description of the problem The error will occur if the num_workers > 0 , But when I set num_workers = 0 , the error disappeared, though, this will slow down the trainning speed. I think the multiprocessing really matters here .How can I solve this…
ZhaoAlpha
  • 31
  • 4
2
votes
1 answer

Why does batching with a dataloader not work in a test?

Problem I am trying to test the performance of the following query: query { classes { teachers { user_id } } } When I start up my server and run the query through the graphQL playground, the dataloader works as expected and…
raphael-p
  • 41
  • 5
2
votes
1 answer

DataLoaders - wrapping an iterable over the Dataset means?

I am reading the official documentation of Dataloaders: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html And there is this sentence "DataLoader wraps an iterable around the Dataset.." I know that dataloaders are used to iterate over…
Nephophile
  • 59
  • 1
  • 10
2
votes
0 answers

BlockingIOError: [Errno 11] Resource temporarily unavailable in pytorch dataloader

Update I have found that this problem only occurs when validation starts and validation dataloader is used, I would also be happy to provide any relevant information to solve the problem. I am currently running a neural network model with video…
tangolin
  • 434
  • 5
  • 15
2
votes
0 answers

DataLoader pytorch num_workers

I'm currently looking at this tutorial: https://deeplizard.com/learn/video/kWVgvsejXsE about what is the ideal value for num_workers (optional attribute of the DataLoader class). If I understand well, if you have 2 CPUs, one can be used to load the…
2
votes
1 answer

big data in pytorch, help for tuning steps

I've previously splitted my bigdata: # X_train.shape : 4M samples x 2K features # X_test.shape : 2M samples x 2K features I've prepared the dataloaders target = torch.tensor(y_train.to_numpy()) features = torch.tensor(X_train.values) train =…
Zal78
  • 77
  • 1
  • 5
2
votes
1 answer

torch dataloader for large csv file - incremental loading

I am trying to write a custom torch data loader so that large CSV files can be loaded incrementally (by chunks). I have a rough idea of how to do that. However, I keep getting some PyTorch error that I do not know how to solve. import numpy as…
Petr
  • 1,606
  • 2
  • 14
  • 39
2
votes
0 answers

Does my Sequence length of my Data determin my batch size in timeseries classification with LSTM

The Data i collect contains of 3 features/signals (columns) over some period of time. to make this easy lets say 100 time = 100 seconds. for example like…
patrick823
  • 131
  • 5
2
votes
2 answers

How to iterate over Dataloader until a number of samples is seen?

I'm learning pytorch, and I'm trying to implement a paper about the progressive growing of GANs. The authors train the networks on the given number of images, instead of for a given number of epochs. My question is: is there a way to do this in…
aurelia
  • 493
  • 8
  • 12
2
votes
1 answer

PyTorch - discard dataloader batch

I have a custom Dataset that loads data from large files. Sometimes, the loaded data are empty and I don't want to use them for training. In Dataset I have: def __getitem__(self, i): (x, y) = self.getData(i) #getData loads data and handles…
Martin Perry
  • 9,232
  • 8
  • 46
  • 114
2
votes
1 answer

How can I get a batch of samples from a dataset given a list of idxs in pytorch?

I have a torch.utils.data.Dataset object, I would like to have a DataLoader or a similar object that accepts a list of idxs and returns a batch of samples with the corresponding idxs. Example, I have list_idxs = [10, 109, 7, 12] I would like to do…
thebesttony
  • 327
  • 5
  • 10
2
votes
0 answers

Can the batch order of a pytorch data loader vary across devices, when shuffle=False?

I'm instantiating a pytorch data loader with shuffle=False within a colab notebook (GPU runtime), like so: image_data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False) When I iterate through the data loader, the…
2
votes
2 answers

Shuffling along a given axis in PyTorch

I have the a dataset that gets loaded in with the following dimension [batch_size, seq_len, n_features] (e.g. torch.Size([16, 600, 130])). I want to be able to shuffle this data along the sequence length axis=1 without altering the batch ordering…
NeuralNew
  • 96
  • 1
  • 10