I'm training a model using PyTorch. To load the data, I'm using torch.utils.data.DataLoader
. The data loader is using a custom database I've implemented. A strange problem has occurred, every time the second for
in the following code executes, the number of threads/processes increases and a huge amount of memory is allocated
for epoch in range(start_epoch, opt.niter + opt.niter_decay + 1):
epoch_start_time = time.time()
if epoch != start_epoch:
epoch_iter = epoch_iter % dataset_size
for i, item in tqdm(enumerate(dataset, start=epoch_iter)):
I suspect the threads and memories of the previous iterators are not released after each __iter__()
call to the data loader.
The allocated memory is close to the amount of memory allocated by the main thread/process when the threads are created. That is in the initial epoch the main thread is using 2GB of memory and so 2 threads of size 2GB are created. In the next epochs, 5GB of memory is allocated by the main thread and two 5GB threads are constructed (num_workers
is 2).
I suspect that fork()
function copies most of the context to the new threads.
The following is the Activity monitor showing the processes created by python, ZMQbg/1
are processes related to python.
My dataset used by the data loader has 100 sub-datasets, the __getitem__
call randomly selects one (ignoring the index
). (the sub-datasets are AlignedDataset
from pix2pixHD GitHub repository):