I am trying to execute a retrained PyTorch FasterRCNN in multiple threads on an Nvidia Jetson Xavier.
The main threads add the image path to a Queue. Four worker threads are doing the following things:
- loading the image with PIL
img = Image.open(imgPath)
- transform it into a tensor by
img = to_tensor(img)
fromtourchvision.transforms
- put it to GPU
img = img.to(device)
- execute the RCNN Network
pred = model([img])
- save the results in a normal list
resultList.append(pred)
- delete the variable holding the image with
del img
However, the process runs out of memory after around 10.000 images and get killed by the operating system.
I tried to do the following steps after 1000 images:
- stop all threads
- do garbage collection by
gc.collect()
- clear GPU Memory by
torch.cuda.empty_cache()
- restart threads
However, as expected, it does not solve the problem.
I know there is the DataLoader of PyTorch to do multithreading. Since I use the RCNN in a larger project I tried it without the DataLoader within the execution task.
I'm pretty sure there is no list that stores images, since then, the memory would run out faster. The results of the network are just bounding boxes. Therefore, they also should not consume so much memory. Additionally, the memory consumption is not growing slow, instead its jumps sometimes by around 1GB.
I hope someone have an idea for solving the problem or how to better debug.
Thanks, Peter