61

I'm trying to have an in-depth understanding of how PyTorch Tensor memory model works.

# input numpy array
In [91]: arr = np.arange(10, dtype=float32).reshape(5, 2)

# input tensors in two different ways
In [92]: t1, t2 = torch.Tensor(arr), torch.from_numpy(arr)

# their types
In [93]: type(arr), type(t1), type(t2)
Out[93]: (numpy.ndarray, torch.FloatTensor, torch.FloatTensor)

# ndarray 
In [94]: arr
Out[94]: 
array([[ 0.,  1.],
       [ 2.,  3.],
       [ 4.,  5.],
       [ 6.,  7.],
       [ 8.,  9.]], dtype=float32)

I know that PyTorch tensors share the memory buffer of NumPy ndarrays. Thus, changing one will be reflected in the other. So, here I'm slicing and updating some values in the Tensor t2

In [98]: t2[:, 1] = 23.0

And as expected, it's updated in t2 and arr since they share the same memory buffer.

In [99]: t2
Out[99]: 

  0  23
  2  23
  4  23
  6  23
  8  23
[torch.FloatTensor of size 5x2]


In [101]: arr
Out[101]: 
array([[  0.,  23.],
       [  2.,  23.],
       [  4.,  23.],
       [  6.,  23.],
       [  8.,  23.]], dtype=float32)

But, t1 is also updated. Remember that t1 was constructed using torch.Tensor() whereas t2 was constructed using torch.from_numpy()

In [100]: t1
Out[100]: 

  0  23
  2  23
  4  23
  6  23
  8  23
[torch.FloatTensor of size 5x2]

So, no matter whether we use torch.from_numpy() or torch.Tensor() to construct a tensor from an ndarray, all such tensors and ndarrays share the same memory buffer.

Based on this understanding, my question is why does a dedicated function torch.from_numpy() exists when simply torch.Tensor() can do the job?

I looked at the PyTorch documentation but it doesn't mention anything about this? Any ideas/suggestions?

kmario23
  • 57,311
  • 13
  • 161
  • 150
  • 1
    very interesting question. I do not know the answer but I doubt `torch.Tensor()` may accept other form (for example, list) of input but `torch.from_numpy()` only operates on numpy arrays. – Wasi Ahmad Jan 28 '18 at 04:52

5 Answers5

64

from_numpy() automatically inherits input array dtype. On the other hand, torch.Tensor is an alias for torch.FloatTensor.

Therefore, if you pass int64 array to torch.Tensor, output tensor is float tensor and they wouldn't share the storage. torch.from_numpy gives you torch.LongTensor as expected.

a = np.arange(10)
ft = torch.Tensor(a)  # same as torch.FloatTensor
it = torch.from_numpy(a)

a.dtype  # == dtype('int64')
ft.dtype  # == torch.float32
it.dtype  # == torch.int64
Viacheslav Kroilov
  • 1,561
  • 1
  • 12
  • 23
32

The recommended way to build tensors in Pytorch is to use the following two factory functions: torch.tensor and torch.as_tensor.

torch.tensor always copies the data. For example, torch.tensor(x) is equivalent to x.clone().detach().

torch.as_tensor always tries to avoid copies of the data. One of the cases where as_tensor avoids copying the data is if the original data is a numpy array.

Jadiel de Armas
  • 8,405
  • 7
  • 46
  • 62
  • 18
    This comment is about `torch.tensor` and `torch.as_tensor`, which is good to know. But it does not address OP's question about `torch.from_numpy`. – user650654 May 27 '20 at 17:40
5

This comes from _torch_docs.py; there is also a possible discussion on the "why" here.

def from_numpy(ndarray): # real signature unknown; restored from __doc__
    """
    from_numpy(ndarray) -> Tensor

    Creates a :class:`Tensor` from a :class:`numpy.ndarray`.

    The returned tensor and `ndarray` share the same memory. 
    Modifications to the tensor will be reflected in the `ndarray` 
    and vice versa. The returned tensor is not resizable.

    Example::

        >>> a = numpy.array([1, 2, 3])
        >>> t = torch.from_numpy(a)
        >>> t
        torch.LongTensor([1, 2, 3])
        >>> t[0] = -1
        >>> a
        array([-1,  2,  3])
    """
    pass

Taken from the numpy docs:

Different ndarrays can share the same data, so that changes made in one ndarray may be visible in another. That is, an ndarray can be a “view” to another ndarray, and the data it is referring to is taken care of by the “base” ndarray.

Pytorch docs:

If a numpy.ndarray, torch.Tensor, or torch.Storage is given, a new tensor that shares the same data is returned. If a Python sequence is given, a new tensor is created from a copy of the sequence.

eric
  • 1,029
  • 1
  • 8
  • 9
2

The thing that nobody Mentioned in the answers was that the issue of NumPy array and torch tensors "share the same memory" only happen if you run your code on the CPU, but if you run on GPU as below code each NumPy ,tensor will have it's memory location

device = "cuda" if torch.cuda.is_available() else "cpu"
device #will be cuda means it run in GPU
a = np.ones(5)
print(a)
b = torch.from_numpy(a)
print(b)
a = a * 5
print(a)
print(b)

Output

[1. 1. 1. 1. 1.]
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[5. 5. 5. 5. 5.]
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

Edit

To make it clear, memory sharing between NumPy arrays and PyTorch tensors happens only when they are both in CPU memory, but this sharing does not extend to GPU memory.

When data is moved to the GPU, new memory is allocated for the tensor on the GPU, and changes made to the original NumPy array on the CPU will not be reflected in the GPU tensor, and vice versa.

CPU Memory:

Memory sharing can occur between NumPy arrays and PyTorch tensors when both are in CPU memory.

Changes made to one will be reflected in the other because they share memory.

GPU Memory:

Memory sharing does not occur between NumPy arrays and PyTorch tensors when they are on the GPU.

Any changes made to the original array will not affect the tensor on the GPU, and vice versa.

CPU example

import numpy as np
import torch

device = "cpu"
print("The device is :", device)

a_np = np.array([1, 2, 3])
print("Initial NumPy array 'a_np':", a_np)

a_torch = torch.from_numpy(a_np).to(device)
print("Initial PyTorch tensor 'a_torch':", a_torch)



a_np *= 2
print("\n")
print("After modifying NumPy array:")
print("NumPy array 'a_np':", a_np)
print("PyTorch tensor 'a_torch':", a_torch)
print("")
#print(np.shares_memory(a_torch,a_np))
#print("as you see a_np*2 here, change also a_torch because it shares the memory with a_torch")

Output CPU

The device is : cpu 
Initial NumPy array 'a_np': [1 2 3] 
Initial PyTorch tensor 'a_torch': tensor([1, 2, 3], dtype=torch.int32)


After modifying NumPy array: NumPy array 'a_np': [2 4 6] 
PyTorch tensor 'a_torch': tensor([2, 4, 6], dtype=torch.int32)

GPU Example

device = "cuda"
print("The device is :", device)

a_np = np.array([1, 2, 3])
print("Initial NumPy array 'a_np':", a_np)

a_torch = torch.from_numpy(a_np).to(device)
print("Initial PyTorch tensor 'a_torch':", a_torch)

a_np *= 2
print("\n")
print("After modifying NumPy array:")
print("NumPy array 'a_np':", a_np)
print("PyTorch tensor 'a_torch':", a_torch)

Output GPU

The device is : cuda 
Initial NumPy array 'a_np': [1 2 3] 
Initial PyTorch tensor 'a_torch': tensor([1, 2, 3], device='cuda:0', dtype=torch.int32)


After modifying NumPy array: NumPy array 'a_np': [2 4 6] 
PyTorch tensor 'a_torch': tensor([1, 2, 3], device='cuda:0', dtype=torch.int32)
Mohamed Fathallah
  • 1,274
  • 1
  • 15
  • 17
  • I'm afraid this answer is **misleading**. This has nothing to do with device. `a = a * 5` creates a new object which located in a new memory address. This can be verified by using `np.shares_memory(a, a * 5)`. If we use the same method as the example in [document](https://pytorch.org/docs/stable/generated/torch.from_numpy.html), for example, change the first element of `a` to -1 by using `a[0] = -1`, we can see that the value of `b` is also changed. – user343233 Aug 30 '23 at 15:15
  • @user343233 regarding the example, but the default behavior of `torch.from_numpy()` is to share memory, `b` will share memory with `a` as long as both are in the same memory space (CPU). this doesn't stand true in GPU space. that's it. i will add new example to cover this for you – Mohamed Fathallah Aug 31 '23 at 02:33
  • @user343233 just change `a = a * 5` to `a*=5` which doesn't create new object, and you get the same thing, sharing memory location in `cpu`, and not in `cuda` the answer is not miss leading at all. `device='cpu'` `a = np.array([1, 2, 3])` `a_torch = torch.from_numpy(a).to(device)` `a *=5` – Mohamed Fathallah Aug 31 '23 at 09:05
  • I think what you mean is the behavior of the `to` method, which could copy a tensor created on CPU onto GPU. Therefore, they do not share the same memory. Tensor created by `from_numpy` **itself** can only be located on CPU, even if default device is set by using `torch.set_default_device('cuda')` or `with torch.device('cuda')` (LMK if there are are some ways to change this behavior of `from_numpy`). As ndarray is always on CPU, so they **always share the memory**. This is not related to the type of device(with/without CUDA) you are using. – user343233 Aug 31 '23 at 15:36
  • @user343233 no i don't mean the behavior of `to` the answer is clear if you train on `cpu` there will be memory sharing , if you train on `GPU` there will be not. `device` and `to` not the focus since all the data have to be moved to gpu if you want to train on it even if you set it as default. – Mohamed Fathallah Sep 01 '23 at 06:56
0

I tried doing what you said and its working as expected: Torch 1.8.1, Numpy 1.20.1, python 3.8.5

x = np.arange(8, dtype=np.float64).reshape(2,4)
y_4mNp = torch.from_numpy(x)
y_t = torch.tensor(x)
print(f"x={x}\ny_4mNp={y_4mNp}\ny_t={y_t}") 

All variables have same values right now as expected:

x=[[0. 1. 2. 3.]
 [4. 5. 6. 7.]]
y_4mNp=tensor([[0., 1., 2., 3.],
        [4., 5., 6., 7.]], dtype=torch.float64)
y_t=tensor([[0., 1., 2., 3.],
        [4., 5., 6., 7.]], dtype=torch.float64)

From_numpy does use the same underlying memory that the np variable uses. So changing either the np or the .from_numpy variables impact each other but NOT the tensor variable. But changes to y_t affect only itself and not the numpy or the from_numpy variables.

x[0,1] = 111       ## changed the numpy variable itself directly
y_4mNp[1,:] = 500  ## changed the .from_numpy variable
y_t[0,:] = 999     ## changed the tensor variable
print(f"x={x}\ny_4mNp={y_4mNp}\ny_t={y_t}")

Output now:

x=[[  0. 111.   2.   3.]
 [500. 500. 500. 500.]]
y_4mNp=tensor([[  0., 111.,   2.,   3.],
        [500., 500., 500., 500.]], dtype=torch.float64)
y_t=tensor([[999., 999., 999., 999.],
        [  4.,   5.,   6.,   7.]], dtype=torch.float64)

Dunno if this was an issue with earlier versions?

rbewoor
  • 305
  • 1
  • 3
  • 14