14

I have a list of Numpy arrays that looks like this:

[400.31865662]
[401.18514808]
[404.84015554]
[405.14682194]
[405.67735105]
[273.90969447]
[274.0894528]

When I try to convert it to a Pandas Dataframe with the following code

y = pd.DataFrame(data)
print(y)

I get the following output when printing it. Why do I get all those zeros?

            0
0  400.318657
            0
0  401.185148
            0
0  404.840156
            0
0  405.146822
            0
0  405.677351
            0
0  273.909694
            0
0  274.089453

I would like to get a single column dataframe which looks like that:

400.31865662
401.18514808
404.84015554
405.14682194
405.67735105
273.90969447
274.0894528
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Yannick
  • 399
  • 3
  • 10
  • 16
  • You must be doing something else, because I get exactly what you'd expect. What exactly does `data` look like before you create the `DataFrame`? It looks like each item is its own `DataFrame` – Andrew Dec 17 '18 at 13:18
  • I cannot reproduce your error, can you post the `print(data)`?. A dataframe needs to have an index (row indicator) and a column name (column indicator). If you do not provide them, pandas will create them automatically: you should see 0,1,2.. in rows and 0 in columns when calling `print(df)`. If you want to see only the data, use `y.values` – Tarifazo Dec 17 '18 at 13:21
  • the issue is with your array: `array = np.array(np.random.randn(5))` then `pd.DataFrame(array)`. Works as one would expect. – It_is_Chris Dec 17 '18 at 13:39
  • You are right Andrew (data) is indeed a list of arrays I did not realize it. So how can I aggregate them into a single array so that I can convert it to a Pandas dataframe? – Yannick Dec 17 '18 at 13:59
  • As (data) is actually a list of array I tried the following code: `newdf = pd.DataFrame(data) newdf.to_csv('test.csv',mode='w', sep=',',header=False,index=False)` The result I get is only the last array of the list which is `274.08945279667057`. How can I concatenate the list of arrays into the same file? – Yannick Dec 17 '18 at 14:34

4 Answers4

21

You could flatten the numpy array:

import numpy as np
import pandas as pd

data = [[400.31865662],
        [401.18514808],
        [404.84015554],
        [405.14682194],
        [405.67735105],
        [273.90969447],
        [274.0894528]]

arr = np.array(data)

df = pd.DataFrame(data=arr.flatten())

print(df)

Output

            0
0  400.318657
1  401.185148
2  404.840156
3  405.146822
4  405.677351
5  273.909694
6  274.089453
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • 1
    This doesn't really address the issue, because `pd.DataFrame(data)` works even if you don't flatten the data. The problem is something else, and this may or may not have solved OP's problem in the end. – cs95 Dec 17 '18 at 13:24
  • 1
    All great answers above, one other thing one can do is to add a column name if that helps `df = pd.DataFrame(data=arr.flatten(), columns=['Values'])` – Pramit Mar 28 '19 at 01:39
16

Since I assume the many visitors of this post aren't here for OP's specific and un-reproducible issue, here's a general answer:

df = pd.DataFrame(array)

The strength of pandas is to be nice for the eye (like Excel), so it's important to use column names.

import numpy as np
import pandas as pd

array = np.random.rand(5, 5)
array([[0.723, 0.177, 0.659, 0.573, 0.476],
       [0.77 , 0.311, 0.533, 0.415, 0.552],
       [0.349, 0.768, 0.859, 0.273, 0.425],
       [0.367, 0.601, 0.875, 0.109, 0.398],
       [0.452, 0.836, 0.31 , 0.727, 0.303]])
columns = [f'col_{num}' for num in range(5)]
index = [f'index_{num}' for num in range(5)]

Here's where the magic happens:

df = pd.DataFrame(array, columns=columns, index=index)
            col_0     col_1     col_2     col_3     col_4
index_0  0.722791  0.177427  0.659204  0.572826  0.476485
index_1  0.770118  0.311444  0.532899  0.415371  0.551828
index_2  0.348923  0.768362  0.858841  0.273221  0.424684
index_3  0.366940  0.600784  0.875214  0.108818  0.397671
index_4  0.451682  0.836315  0.310480  0.727409  0.302597
Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
5

There is another way, which isn't mentioned in the other answers. If you have a NumPy array which is essentially a row vector (or column vector) i.e. shape like (n, ) , then you could do the following :

# sample array
x = np.zeros((20))
# empty dataframe
df = pd.DataFrame()
# add the array to df as a column
df['column_name'] = x

This way you can add multiple arrays as separate columns.

akshayk07
  • 2,092
  • 1
  • 20
  • 32
4

I just figured out my mistake. (data) was a list of arrays:

[array([400.0290173]), array([400.02253235]), array([404.00252113]), array([403.99466754]), array([403.98681395]), array([271.97896036]), array([271.97110677])]

So I used np.vstack(data) to concatenate it

conc = np.vstack(data)

[[400.0290173 ]
 [400.02253235]
 [404.00252113]
 [403.99466754]
 [403.98681395]
 [271.97896036]
 [271.97110677]]

Then I convert the concatened array into a Pandas Dataframe by using the

newdf = pd.DataFrame(conc)


    0
0  400.029017
1  400.022532
2  404.002521
3  403.994668
4  403.986814
5  271.978960
6  271.971107

Et voilà!

Yannick
  • 399
  • 3
  • 10
  • 16