I want to understand how pin_memory in Dataloader works.
According to the documentation:
pin_memory (bool, optional) – If True, the data loader will copy tensors into CUDA pinned memory before returning them.
Below is a self-contained code example.
import torchvision
import torch
print('torch.cuda.is_available()', torch.cuda.is_available())
train_dataset = torchvision.datasets.CIFAR10(root='cifar10_pytorch', download=True, transform=torchvision.transforms.ToTensor())
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, pin_memory=True)
x, y = next(iter(train_dataloader))
print('x.device', x.device)
print('y.device', y.device)
Producing the following output:
torch.cuda.is_available() True
x.device cpu
y.device cpu
But I was expecting something like this, because I specified flag pin_memory=True
in Dataloader
.
torch.cuda.is_available() True
x.device cuda:0
y.device cuda:0
Also I run some benchmark:
import torchvision
import torch
import time
import numpy as np
pin_memory=True
train_dataset =torchvision.datasets.CIFAR10(root='cifar10_pytorch', download=True, transform=torchvision.transforms.ToTensor())
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, pin_memory=pin_memory)
print('pin_memory:', pin_memory)
times = []
n_runs = 10
for i in range(n_runs):
st = time.time()
for bx, by in train_dataloader:
bx, by = bx.cuda(), by.cuda()
times.append(time.time() - st)
print('average time:', np.mean(times))
I got the following results.
pin_memory: False
average time: 6.5701503753662
pin_memory: True
average time: 7.0254474401474
So pin_memory=True
only makes things slower.
Can someone explain me this behaviour?