0

I want to combine several (say 15) long arrays of shape (3072,) into one np.array of shape (15,3072). If have found a solution but that's including a nested if-clause in a for-loop, which seems to be inefficient for me. Is there a more efficient solution to come up with a numpy array of the necessary shape (and not a list)? Here is the code:

# Creating sample array
arr1 = np.array([8, 19, 43], dtype=int)

# What I do at the moment (and it works)
arr_out = np.array([])
for i in range(3):
    if i == 0:
        arr_out = np.hstack((arr_out, arr1))
    else:
        arr_out = np.vstack((arr_out, arr1))
        
arr_out # Output is correct shape (Each "new" array gets a new row.)

array([[ 8., 19., 43.], [ 8., 19., 43.], [ 8., 19., 43.]])

What happens when I use np.append:

# How to do the above without the if loop in the for loop?
arr_out = np.array([])
for i in range(3):
    arr_out = np.append(arr_out, arr1, axis=0)

arr_out # Output is not in correct shape

array([ 8., 19., 43., 8., 19., 43., 8., 19., 43.])

Do you see any efficient way of getting to numpy.array shape of the first example without using a list (or at least not having a list in the end)?

  • Why do you feel you need to `vstack` them one at a time? Consider using `vstack((a, tuple, of, many, arrays, ...))` – donkopotamus Mar 26 '21 at 20:31
  • `np.append` uses `np.concatenafe` same as the `stack` functions Look at their code. List append is faster. – hpaulj Mar 26 '21 at 20:43
  • Thanks for you quick replies. `vstack` one at a time is necessary as I have a long for-loop and cannot save all individuals arrays in between. LIst append is indeed faster, but does not work on the purpose because it produces similar output as the `np.append` and `np.concatenate`, namely a 1D-array of all features where I want an nD-array where n is the number of observations (basically the number of iterations performed in the for-loop). – Thilo Sander Mar 27 '21 at 09:40

1 Answers1

0

Solved it myself by initalizing the array arr_out with correct number of columns I need (would be three in the mini-example above). Then you can get rid of the if-clause and directly perform the np.vstack. However, when the array has many columns (in my real case > 3000) it seems to me that getting rid of the if-clause is coming with the payoff of initializing a large empty array. Thus getting rid of the if-clause will only get you better off in terms of run-time when you loop a lot of times (true in my case as I will run through it about a 60.000 times). Here is the code:

# Creating sample array
arr1 = np.array([8, 19, 43], dtype=int)

# Solution
arr_out = np.array([0,len(arr1)])
for i in range(3): #I run through the loop a couple of thousand times
    arr_out = np.vstack((arr_out, arr1))