0

In a situation like the one below, how do I vstack the two matrices?

import numpy as np 

a = np.array([[3,3,3],[3,3,3],[3,3,3]])
b = np.array([[2,2],[2,2],[2,2]])

a = np.vstack([a, b])

Output:   
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2

The output I would like would look like this:

a = array([[[3, 3, 3],
            [3, 3, 3],
            [3, 3, 3]],
           [[2, 2],
            [2, 2],
            [2, 2]]])

My goal is to then to loop over the content of the stacked matrices, index each matrix and call a function on a specific row.

for matrix in a:
   row = matrix[1]
   print(row)

Output: 
[3, 3, 3]
[2, 2]
C.L.
  • 106
  • 6

2 Answers2

1

Be careful with those "Numpy is faster" claims. If you already have arrays, and make full use of array methods, numpy is indeed faster. But if you start with lists, or have to use Python level iteration (as you do in Pack...), the numpy version might well be slower.

Just doing a time test on the Pack step:

In [12]: timeit Pack_Matrices_with_NaN([a,b,c],5)
221 µs ± 9.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Compare that with fetching the first row of each array with a simple list comprehension:

In [13]: [row[1] for row in [a,b,c]]
Out[13]: [array([3., 3., 3.]), array([2., 2.]), array([4., 4., 4., 4.])]
In [14]: timeit [row[1] for row in [a,b,c]]
808 ns ± 2.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

200 µs compared to less than 1 µs!

And timing your Unpack:

In [21]: [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in range(3)]
    ...: 
Out[21]: [array([3., 3., 3.]), array([2., 2.]), array([4., 4., 4., 4.])]
In [22]: timeit [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in ra
    ...: nge(3)]
199 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

I was able to solve this only using NumPy. As NumPy is significantly faster than python's list function (https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347) I wanted to share my answer as it might be useful to others.

I started with adding np.NaN to make the two arrays the same shape.

import numpy as np 

a = np.array([[3,3,3],[3,3,3],[3,3,3]]).astype(float)
b = np.array([[2,2],[2,2],[2,2]]).astype(float)

# Extend each vector in array with Nan to reach same shape
b = np.insert(b, 2, np.nan, axis=1)

# Now vstack the arrays 
a = np.vstack([[a], [b]])
print(a)

Output: 
[[[ 3.  3.  3.]
  [ 3.  3.  3.]
  [ 3.  3.  3.]]

 [[ 2.  2. nan]
  [ 2.  2. nan]
  [ 2.  2. nan]]]

Then I wrote a function to unpack each array in a, and remove the nan.

def Unpack_Matrix_with_NaN(Matrix_with_nan, matrix_of_interest):
    for first_row in Matrix_with_nan[matrix_of_interest,:1]:
        # find shape of matrix row without nan 
        first_row_without_nan = first_row[~np.isnan(first_row)]
        shape = first_row_without_nan.shape[0]
        matrix_without_nan = np.arange(shape)
        for row in Matrix_with_nan[matrix_of_interest]:
            row_without_nan = row[~np.isnan(row)]
            matrix_without_nan = np.vstack([matrix_without_nan, row_without_nan])
        # Remove vector specifying shape 
        matrix_without_nan = matrix_without_nan[1:]
        return matrix_without_nan

I could then loop through the matrices, find my desired row, and print it.

Matrix_with_nan = a

for matrix in range(len(Matrix_with_nan)):
    matrix_of_interest = Unpack_Matrix_with_NaN(a, matrix)
    row = matrix_of_interest[1]
    print(row)

Output: 
[3. 3. 3.]
[2. 2.]

I also made a function to pack matrices when more than one nan needs to be added per row:

import numpy as np 

a = np.array([[3,3,3],[3,3,3],[3,3,3]]).astype(float)
b = np.array([[2,2],[2,2],[2,2]]).astype(float)
c = np.array([[4,4,4,4],[4,4,4,4],[4,4,4,4]]).astype(float)

# Extend each vector in array with Nan to reach same shape
def Pack_Matrices_with_NaN(List_of_matrices, Matrix_size):
    Matrix_with_nan = np.arange(Matrix_size)
    for array in List_of_matrices:
        start_position = len(array[0])
        for x in range(start_position,Matrix_size):
            array = np.insert(array, (x), np.nan, axis=1)
        Matrix_with_nan = np.vstack([Matrix_with_nan, array])
    Matrix_with_nan = Matrix_with_nan[1:]
    return Matrix_with_nan

arrays = [a,b,c]
packed_matrices = Pack_Matrices_with_NaN(arrays, 5)
print(packed_matrices) 

Output:
[[ 3.  3.  3. nan nan]
 [ 3.  3.  3. nan nan]
 [ 3.  3.  3. nan nan]
 [ 2.  2. nan nan nan]
 [ 2.  2. nan nan nan]
 [ 2.  2. nan nan nan]
 [ 4.  4.  4.  4. nan]
 [ 4.  4.  4.  4. nan]
 [ 4.  4.  4.  4. nan]]
C.L.
  • 106
  • 6