Questions tagged [dataloader]

DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.

GitHub: dataloader

430 questions
3
votes
1 answer

Graphql nested cursor based pagination, resolver and SQL query

Is there a way to implement graphql cursor based pagination with nested pagination queries in a performant way? Let's say we have 3 pseudo graphql types: type User { id: ID! books: [Book!]! } type Book { isbn: ID! pages: [Page!]! } type…
ZiiMakc
  • 31,187
  • 24
  • 65
  • 105
3
votes
2 answers

How to create batches using PyTorch DataLoader such that each example in a given batch has the same value for an attribute?

Suppose I have a list, datalist which contains several examples (which are of type torch_geometric.data.Data for my use case). Each example has an attribute num_nodes For demo purpose, such datalist can be created using the following snippet of…
Arun
  • 114
  • 1
  • 9
3
votes
0 answers

How to use torch custom dataset with fastai data loaders

I created a custom torch dataset and specified 2 required methods: __getitem__ and __len__. Then I created two torch data loaders: train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=2) val_loader =…
jakes
  • 1,964
  • 3
  • 18
  • 50
3
votes
0 answers

Couldn't open shared file mapping error in Pytorch Dataset

When training a model using a custom dataset in Pytorch 1.4 the following error is thrown after a seemingly random amount of epochs. RuntimeError: Couldn't open shared file mapping: , error code: <1455> The dataset is…
Mark wijkhuizen
  • 373
  • 3
  • 10
3
votes
2 answers

PyTorch DataLoader uses identical random transformation across each epoch

There is a bug in PyTorch/Numpy where when loading batches in parallel with a DataLoader (i.e. setting num_workers > 1), the same NumPy random seed is used for each worker, resulting in any random functions applied being identical across…
iacob
  • 20,084
  • 6
  • 92
  • 119
3
votes
1 answer

Pytorch 1.7.0 | DataLoader Error - TypeError: 'module' object is not callable

This is my code, I am using pycharm! Imports import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F import torch.utils.data as DataLoader import torchvision.datasets as Datasets import torchvision.transforms…
Valadanchik
  • 33
  • 1
  • 4
3
votes
2 answers

Examples or explanations of pytorch dataloaders?

I am fairly new to Pytorch (and have never done advanced coding). I am trying to learn the basics of deep learning using the d2l.ai textbook but am having trouble with understanding the logic behind the code for dataloaders. I read the…
Merry
  • 215
  • 2
  • 7
3
votes
1 answer

KeyError when enumerating over dataloader - why?

I am writing a binary classification model that consists of audio files of 40 participants and classifies them according to whether they have a speech disorder or not. The audio files have been divided into 5 second segments and to avoid subject…
19905EJones
  • 31
  • 1
  • 2
3
votes
1 answer

How do I fix the Dataset to return desired output (pytorch)

I am trying to use information from the outside functions to decide which data to return. Here, I have added a simplified code to demonstrate the problem. When I use num_workers = 0, I get the desired behavior (The output after 3 epochs is 18). But,…
deep s. pandey
  • 150
  • 1
  • 11
3
votes
0 answers

Numpy memmap throttles with Pytorch Dataloader when available RAM less than file size

I'm working on a dataset that is too big to fit into RAM. The solution I'm trying currently is to use numpy memmap to load one sample/row at a time using Dataloader. The solution looks something like this: class MMDataset(torch.utils.data.Dataset): …
Kevin
  • 281
  • 2
  • 5
3
votes
1 answer

Load multiple .npy files (size > 10GB) in pytorch

Im looking for a optimized solution to load multiple huge .npy files using pytorch data loader. I'm currently using the following method which creates a new dataloader for each file in each epoch. My data loader is something like: class…
prime130392
  • 133
  • 1
  • 2
  • 13
3
votes
2 answers

NOT using multiprocessing but get CUDA error on google colab while using PyTorch DataLoader

I've cloned my GitHub repo into google colab and trying to load data using PyTorch's DataLoader. global gpu, device if torch.cuda.is_available(): gpu = True device = 'cuda:0' torch.set_default_tensor_type('torch.cuda.FloatTensor') …
3venthoriz0n
  • 117
  • 2
  • 12
3
votes
2 answers

KeyError when enumerating over dataloader

I'm trying to iterate over a pytorch dataloader initialized as follows: trainDL = torch.utils.data.DataLoader(X_train,batch_size=BATCH_SIZE, shuffle=True, **kwargs) where X_train is a pandas dataframe like this one: So, I'm not being able to do the…
sooaran
  • 183
  • 1
  • 2
  • 13
3
votes
1 answer

PyTorch DataLoader returns the batch as a list with the batch as the only entry. How is the best way to get a tensor from my DataLoader

I currently have the following situation where I want to use DataLoader to batch a numpy array: import numpy as np import torch import torch.utils.data as data_utils # Create toy data x = np.linspace(start=1, stop=10, num=10) x =…
Auss
  • 451
  • 5
  • 9
2
votes
1 answer

Best way to use Pyothon iterator as dataset in PyTorch

The PyTorch DataLoader turns datasets into iterables. I already have a generator which yields data samples that I want to use for training and testing. The reason I use a generator is because the total number of samples is too large to store in…
bja
  • 27
  • 3