Restructuring a 2-D numpy array into a 3-D numpy array according to values in a column of a dataframe

Question

I have a 2-D numpy array let's say like this:

matrix([[1., 0., 0., ..., 1., 0., 0.],
        [1., 0., 0., ..., 0., 1., 1.],
        [1., 0., 0., ..., 1., 0., 0.],
        [1., 1., 0., ..., 1., 0., 0.],
        [1., 1., 0., ..., 1., 0., 0.],
        [1., 1., 0., ..., 1., 0., 0.]])

I want to transform it into a 3-D numpy array based on the values of a column of a dataframe. Let's say the column is like this:

df = pd.DataFrame({"Case":[1,1,2,2,3,4]})

The final 3-D array should look like this:

 matrix([
           [ 
              [1., 0., 0., ..., 1., 0., 0.], [1., 0., 0., ..., 0., 1., 1.] 
           ],
           [
              [1., 0., 0., ..., 1., 0., 0.], [1., 1., 0., ..., 1., 0., 0.]
           ],
           [
              [1., 1., 0., ..., 1., 0., 0.]
           ],
           [
              [1., 1., 0., ..., 1., 0., 0.]
           ]
        ])

The first 2 arrays of the initial 2-D array becomes a 2-D array of the final 3-D array because from the column of the dataframe the first and second rows both have the same values of '1'. Similarly, the next 2 arrays become another 2-D array of 2 arrays because the next two values of the column of the dataframe are '2' so the belong together. There is only one row for the values '3' and '4' so the next 2-D arrays of the 3-D array has only 1 array each.

So, basically if two or more numbers of the column of the dataframe are same, then those indices of rows of the 2-D initial matrix belong together and are transformed into a 2-D matrix and pushed as an element of the final 3-D matrix.

How do I do this?

numpy does not allow non-rectangular arrays. Your example matrix cannot be a 3-D array in numpy. Do you mean to have a list of arrays? — Ehsan, Jun 09 '20 at 12:26

score 0 · Answer 1 · answered Jun 09 '20 at 12:25

0

Numpy doesn't have very good support for arrays with rows of different length, but you can make it a list of 2D arrays instead:

M = np.ndarray(
[[1., 0., 0., ..., 1., 0., 0.],
 [1., 0., 0., ..., 0., 1., 1.],
 [1., 0., 0., ..., 1., 0., 0.],
 [1., 1., 0., ..., 1., 0., 0.],
 [1., 1., 0., ..., 1., 0., 0.],
 [1., 1., 0., ..., 1., 0., 0.]]
)

df = pd.DataFrame({"Case":[1,1,2,2,3,4]})

M_per_case = [
    np.stack([M[index] for index in df.index[df['Case'] == case]]) 
    for case in set(df['Case'])
]

answered Jun 09 '20 at 12:25

ybnd

51
1
4

Hi! Thank you for your answer! I understand the problem of numpy not having a good support for arrays with rows of different length. Your answers provides me a list of 2D arrays. Now, I need to make the 2-D array to be of equal length by padding and then convert it into a numpy 3-D array. I need it train an LSTM model. How do I do this? – Indy Jun 09 '20 at 12:58
Wouldn't padding your data affect the model you train? – ybnd Jun 09 '20 at 13:25

Restructuring a 2-D numpy array into a 3-D numpy array according to values in a column of a dataframe

1 Answers1