I am trying to figure out the rounding difference between numpy/pytorch, gpu/cpu, float16/float32 numbers and what I'm finding confuses me.
The basic version is:
a = torch.rand(3, 4, dtype=torch.float32)
b = torch.rand(4, 5, dtype=torch.float32)
print(a.numpy()@b.numpy() - a@b)
The result is all zeros as expected, however
print((a.cuda()@b.cuda()).cpu() - a@b)
gets non-zero results. Why is Pytorch float32 matmul executed differently on gpu and cpu?
An even more confusing experiment involves float16, as follows:
a = torch.rand(3, 4, dtype=torch.float16)
b = torch.rand(4, 5, dtype=torch.float16)
print(a.numpy()@b.numpy() - a@b)
print((a.cuda()@b.cuda()).cpu() - a@b)
these two results are all non-zero. Why are float16 numbers handled differently by numpy and torch? I know cpu can only do float32 operations and numpy convert float16 to float32 before computing, however the torch calculation is also executed on cpu.
And guess what, print((a.cuda()@b.cuda()).cpu() - a.numpy()@b.numpy())
gets an all zero result! This is pure fantasy for me...
The environment is as follow:
- python: 3.8.5
- torch: 1.7.0
- numpy: 1.21.2
- cuda: 11.1
- gpu: GeForce RTX 3090
On the advice of some of the commenters, I add the following equal test
(a.numpy()@b.numpy() - (a@b).numpy()).any()
((a.cuda()@b.cuda()).cpu() - a@b).numpy().any()
(a.numpy()@b.numpy() - (a@b).numpy()).any()
((a.cuda()@b.cuda()).cpu() - a@b).numpy().any()
((a.cuda()@b.cuda()).cpu().numpy() - a.numpy()@b.numpy()).any()
respectively directly following the above five print functions, and the results are:
False
True
True
True
False
And for the last one, I've tried several times and I think I can rule out luck.