Memory error when iterate over two dataloaders simultaneously in pytorch

Question

I am trying to train my model using 2 dataloaders from 2 different datasets.

I found how to set up this by using cycle() and zip() because my datasets are not the same length from here: How to iterate over two dataloaders simultaneously using pytorch?

  File "/home/Desktop/example/train.py", line 229, in train_2
    for i, (x1, x2) in enumerate(zip(cycle(train_loader_1), train_loader_2)):
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in __next__
    data = self.dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 154140672 bytes. Error code 12 (Cannot allocate memory)

I tried to solve that by setting num_workers=0, decreasing the batch size, using pinned_memory=False and shuffle=False... But none of it worked... I am having 256GB of RAM and 4 NVIDIA TESLA V100 GPUs.

I tried to run it just by not training in 2 dataloaders simultaneously but individually and it worked. However for my project I need this parallel training with 2 datasets...

You must accidentally call something that materializes all your data in the memory. Can you share a code snippet? — Jindřich, Sep 11 '19 at 13:18
Thanks for the response, I found the solution to this one: cycle() ans zip() might create memory leakage problems. This solves it: https://github.com/pytorch/pytorch/issues/1917#issuecomment-433698337 — afroditi, Sep 11 '19 at 13:29
Then, post an answer to your question, so other people can find it more easily ;-) And, you'll get a bronze batch for it as well. — Jindřich, Sep 11 '19 at 13:33

afroditi · Accepted Answer · 2019-09-11T21:30:08.023

4

Based on this discussion, instead of cycle() and zip() I avoid any errors by using:

  try:
     data, target = next(dataloader_iterator)
  except StopIteration:
     dataloader_iterator = iter(dataloader)
     data, target = next(dataloader_iterator)

kudos to @srossi93 from this pytorch post!

edited Sep 11 '19 at 21:30

answered Sep 11 '19 at 13:39

afroditi

307
1
3
13

Memory error when iterate over two dataloaders simultaneously in pytorch

1 Answers1