4

I can run all the cells of the tutorial notebook of Pytorch about dataloading (pytorch tutorial). But when I use OpenCV in place of Skimage to resize the image, the dataloader gets stuck, i.e nothing happens.

In the Rescale class:

class Rescale(object):
    .....
    def __call__(self, sample):
       ....
       #img = transform.resize(image, (new_h, new_w))
       img = cv2.resize(image, (new_h, new_w))
       .....

The dataloader and the for loop are defined with:

dataloader = DataLoader(transformed_dataset, batch_size=4,
                        shuffle=True, num_workers=4)

for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, sample_batched['image'].size(),
          sample_batched['landmarks'].size())

I can get the iterator to print something if num_workers=0. It looks like opencv does not play well with the Multiprocessing of pytorch. I would really prefer to use same package to transform the images at train time and test time (and I am already using OpenCV for the image rescale at test time). Any suggestions would be greatly appreciated.

JMarc
  • 984
  • 1
  • 13
  • 21

2 Answers2

8

I had a very similar problem and that's how I solved it:

when you import cv2 set cv2.setNumThreads(0) and then you can set num_workers>0 in the dataloader in PyTorch.

Seems like OpenCV tries to multithread and somewhere something goes into a deadlock.

Hope it helps.

2

Except for cv2.setNumThreads(0),

import multiprocessing
multiprocessing.set_start_method('spawn')

can also solve this problem.

From Pytorch issues these five may help (not recommended):

1. time.sleep(0.003)
2. pin_memory = True/False
3. num_workers = 0/1
4. from torch.utils.data.dataloader import DataLoader
5. writing 8192 to /proc/sys/kernel/shmmni

OpenCV and Pytorch multiprocessing don't play well together, sometimes. When running a code with OpenCV functions calls embedded in a homebrewed function parallelized in a "multiprocessing" pool, the code eventually ends up with idle processors after several calls to the pool than can fluctuate from run to run.

Forking could be the problem, forking only clones the current thread. In this case, the thread pool may wrongly assume that it has more threads than it has.

The code is waiting infinitely for a condition never signaled because the number of threads is not as expected after forking.

Spawning resets the memory after forking, forcing a re-initialization of all data structures (as the thread pool) within OpenCV.

More discussion in Github of OpenCV issues and Pytorch issues.

thunder
  • 93
  • 6