pytorch: loading data from txt using dataloader with prefetch_factor

Question

I have a 2D array with size (20000000,500) in a txt file. Since it is too large and it cannot fit in my computer, I will have to prefetch it and train my model using pytorch. I think I will need to use dataLoader with 'prefetch_factor' parameter. Does anyone know how I would do this please? Thank you.

score 0 · Answer 1 · answered Jun 19 '21 at 20:10

0

just pass the prefetch parameter in the DataLoader class

example: if you pass prefetch param as 3 then 3 * num_workers samples will be prefetched across all workers

by default its value is 2

refer this for detailed explanation : https://pytorch.org/docs/stable/data.html

answered Jun 19 '21 at 20:10

Nakul

311
3
8

I understand how prefector factor works. But the data set is a txt file , is parameter 'dataset' of 'DataLoader' compatible with txt file? If I read txt file to a numpy array and then pass it to dataset, it won't fit in memory. It's 2 problems. One is prefetch, other one is how to read txt into dataloader. – G-09 Jun 19 '21 at 21:21
refer this : https://medium.com/swlh/how-to-use-pytorch-dataloaders-to-work-with-enormously-large-text-files-bbd672e955a0 – Nakul Jun 20 '21 at 10:18

pytorch: loading data from txt using dataloader with prefetch_factor

1 Answers1