I have a 2D array with size (20000000,500) in a txt file. Since it is too large and it cannot fit in my computer, I will have to prefetch it and train my model using pytorch. I think I will need to use dataLoader with 'prefetch_factor' parameter. Does anyone know how I would do this please? Thank you.
Asked
Active
Viewed 1,498 times
1 Answers
0
just pass the prefetch parameter in the DataLoader class
example: if you pass prefetch param as 3 then 3 * num_workers samples will be prefetched across all workers
by default its value is 2
refer this for detailed explanation : https://pytorch.org/docs/stable/data.html

Nakul
- 311
- 3
- 8
-
I understand how prefector factor works. But the data set is a txt file , is parameter 'dataset' of 'DataLoader' compatible with txt file? If I read txt file to a numpy array and then pass it to dataset, it won't fit in memory. It's 2 problems. One is prefetch, other one is how to read txt into dataloader. – G-09 Jun 19 '21 at 21:21
-
refer this : https://medium.com/swlh/how-to-use-pytorch-dataloaders-to-work-with-enormously-large-text-files-bbd672e955a0 – Nakul Jun 20 '21 at 10:18