4

Suppose I have an algorithm that does the following in python with pytorch. Please ignore whether the steps are efficient. This is just a silly toy example.

 def foo(input_list):
      # input_list is a list of N 2-D pytorch tensors of shape (h,w)
      tensor = torch.stack(input_list) # convert to tensor.shape(h,w,N)
      tensor1 = torch.transpose(tensor,0,2).unsqueeze(1) # convert to tensor.shape(N,1,h,w)
      tensor2 = torch.interpolate(tensor1,size=(500,500) # upsample to new shape (N,1,500,500)

 def bar(input_list):
      tensor = torch.stack(input_list) # convert to tensor.shape(h,w,N)
      tensor = torch.transpose(tensor,0,2).unsqueeze(1) # convert to tensor.shape(N,1,h,w)
      tensor = torch.interpolate(tensor,size=(500,500) # upsample to new shape (N,1,500,500)

My question is whether it makes more sense to use method foo() or bar() or if it doesn't matter. My thought was that I save memory by rewriting over the same variable name (bar), since I will never actually need those intermediate steps. But if the CUDA interface is creating new memory spaces for each function, then I'm spending the same amount of memory with both methods.

Reactgular
  • 52,335
  • 19
  • 158
  • 208
AstroBen
  • 813
  • 2
  • 9
  • 20
  • 1
    It doesnt matter which function you use because in both the cases new variables are being created. If u dont want to create new variables look at inplace function like transpose_ – Aditya Jan 28 '20 at 22:06
  • 1
    if you were really running into memory issues you could use torch.cuda.empty_cache() which will free unused GPU memory. There should be some difference if you call this function at the end of the function since having references to that memory should prevent it from being cleared. (I'm assuming you're not tracking gradients since that produces references even if the intermediate tensors arent explicitly referenced with a variable). – jodag Jan 29 '20 at 02:15

1 Answers1

1

tensor and tensor1 in your example are just different views of the same data in memory, so the memory difference of potentially maintaining two slightly different references to it should be negligible. The relevant part would only be tensor1 vs tensor2.

You might want to see this similar question: Python: is the "old" memory free'd when a variable is assigned new content?

Since the reassignment to tensor that actually allocates new memory is also the final call in bar, I suspect that in this particular example the total memory wouldn't be impacted (tensor1 would be unreferenced once the function returns anyway).

With a longer chain of operations, I don't think the GC is guaranteed to be called on any of these reassignments, though it might give python some more flexibility. I'd probably prefer the style in foo just because it's easier to later change the order of operations in the chain. Keeping track of different names adds overhead for the programmer, not just the interpreter.

nairbv
  • 4,045
  • 1
  • 24
  • 26