3

I have two (large) arrays. For illustration purposes I'm using a simple example below:

In [14]: arr1 = np.arange(32*512).reshape(32, 512)
In [15]: arr2 = np.arange(512).reshape(1, 512)

And I wanted to do a horizontal concatenation of these arrays (i.e. concatenation along axis 1). I came up with the following approach to achieve this:

In [16]: np.hstack([arr1, np.tile(arr2, (arr1.shape[0], 1))]).shape
Out[16]: (32, 1024)

This works as intended. However, I would like to know whether there are any other efficient ways of doing this concatenation without using numpy.tile. I'm afraid I would blow-up my memory requirements since the arrays are really huge.

If it's possible avoid this duplication of rows (to match the dimensions of arr1), maybe using broadcasting, then it'd be great!


P.S. The reason why I want to avoid this copying is because of linear growth of memory requirements:

In [20]: arr2.nbytes
Out[20]: 4096

In [19]: np.tile(arr2, (arr1.shape[0], 1)).nbytes
Out[19]: 131072

In [22]: arr1.shape[0] * arr2.nbytes
Out[22]: 131072
kmario23
  • 57,311
  • 13
  • 161
  • 150

1 Answers1

1

You can preallocate and use broadcasting but it won't save much (I'd expect peak memory usage to go down by roughly a quarter):

arr1 = np.arange(32*512).reshape(32, 512)
arr2 = np.arange(512).reshape(1, 512)
out = np.empty((32, 1024), arr1.dtype)
out[:, :512] = arr1
out[:, 512:] = arr2
out
#array([[    0,     1,     2, ...,   509,   510,   511],
#       [  512,   513,   514, ...,   509,   510,   511],
#       [ 1024,  1025,  1026, ...,   509,   510,   511],
#       ...,
#       [14848, 14849, 14850, ...,   509,   510,   511],
#       [15360, 15361, 15362, ...,   509,   510,   511],
#       [15872, 15873, 15874, ...,   509,   510,   511]])
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • Nice! I just checked and the end memory requirement for both approaches are same but this approach might be far more faster than mine since we avoid explicit copy.. – kmario23 Mar 19 '19 at 10:36