22

I have a list of numpy arrays that I'm trying to convert to DataFrame. Each array should be a row of the dataframe.

Using pd.DataFrame() isn't working. It always gives the error: ValueError: Must pass 2-d input.

Is there a better way to do this?

This is my current code:

list_arrays = [ array([[0, 0, 0, 1, 0, 0, 0, 0, 00]], dtype='uint8'), 
                array([[0, 0, 3, 2, 0, 0, 0, 0, 00]], dtype='uint8')
              ]

d = pd.DataFrame(list_arrays)

ValueError: Must pass 2-d input
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
Marcos Santana
  • 911
  • 5
  • 12
  • 21

4 Answers4

28

Option 1:

In [143]: pd.DataFrame(np.concatenate(list_arrays))
Out[143]:
   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0

Option 2:

In [144]: pd.DataFrame(list(map(np.ravel, list_arrays)))
Out[144]:
   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0

Why do I get:

ValueError: Must pass 2-d input

I think pd.DataFrame() tries to convert it to NDArray like as follows:

In [148]: np.array(list_arrays)
Out[148]:
array([[[0, 0, 0, 1, 0, 0, 0, 0, 0]],

       [[0, 0, 3, 2, 0, 0, 0, 0, 0]]], dtype=uint8)

In [149]: np.array(list_arrays).shape
Out[149]: (2, 1, 9)     # <----- NOTE: 3D array
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thank you! All of them works. But i wonder why i was getting that 2-d error. – Marcos Santana Mar 28 '18 at 17:58
  • 1
    For me, `pd.DataFrame(np.concatenate(list_arrays))` just caused all my arrays to flatten and be 1 dimensional instead of "row stacking" them. Therefore, I recommend just use `pd.DataFrame(np.row_stack(list_arrays))` . It was a couple seconds on 140k rows x 17k columns – Corey Levinson Mar 04 '20 at 20:28
8

Alt 1

pd.DataFrame(sum(map(list, list_arrays), []))

   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0

Alt 2

pd.DataFrame(np.row_stack(list_arrays))

   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
4

You can using pd.Series

pd.Series(l).apply(lambda x : pd.Series(x[0]))
Out[294]: 
   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0
BENY
  • 317,841
  • 20
  • 164
  • 234
3

Here is one way.

import numpy as np, pandas as pd

lst = [np.array([[0, 0, 0, 1, 0, 0, 0, 0, 0]], dtype=int),
       np.array([[0, 0, 3, 2, 0, 0, 0, 0, 0]], dtype=int)]

df = pd.DataFrame(np.vstack(lst))

#    0  1  2  3  4  5  6  7  8
# 0  0  0  0  1  0  0  0  0  0
# 1  0  0  3  2  0  0  0  0  0
jpp
  • 159,742
  • 34
  • 281
  • 339