7

Say I have two pandas Series in python:

import pandas as pd
h = pd.Series(['g',4,2,1,1])
g = pd.Series([1,6,5,4,"abc"])

I can create a DataFrame with just h and then append g to it:

df = pd.DataFrame([h])
df1 = df.append(g, ignore_index=True)

I get:

>>> df1
   0  1  2  3    4
0  g  4  2  1    1
1  1  6  5  4  abc

But now suppose that I have an empty DataFrame and I try to append h to it:

df2 = pd.DataFrame([])
df3 = df2.append(h, ignore_index=True)

This does not work. I think the problem is in the second-to-last line of code. I need to somehow define the blank DataFrame to have the proper number of columns.

By the way, the reason I am trying to do this is that I am scraping text from the internet using requests+BeautifulSoup and I am processing it and trying to write it to a DataFrame one row at a time.

bill999
  • 2,147
  • 8
  • 51
  • 103
  • 1
    If you want an empty dataframe you can do just `df = pd.DataFrame()` then your code works – EdChum May 31 '14 at 22:05
  • 1
    Interestingly when you pass an empty list the Dataframe index defaults to `Int64` but it is still empty, when you pass nothing it defaults to `object` unclear why this should matter – EdChum May 31 '14 at 22:08
  • Why do you think that mine is different? The code still doesn't work if I try this fix. – bill999 May 31 '14 at 23:26
  • 1
    What version of pandas are you using? Mine is 0.13.1 64bit – EdChum Jun 01 '14 at 06:08
  • It is 0.12.0. Not sure of whether it is 64bit or not. – bill999 Jun 01 '14 at 19:39
  • I don't think this fault will be due to bitness, I would upgrade your pandas and numpy to the latest versions and try again, if this works then let me know or edit your question as others may find the same issue, I'm running numpy `1.8.1`, actually after revisiting this, your original code passing an empty list also works in pandas `0.13.1` so I think this is a bug in `0.12.0` – EdChum Jun 01 '14 at 19:42
  • I'm trying to upgrade it, but I haven't been able to so far. – bill999 Jun 01 '14 at 20:38
  • Ok. I have upgraded python and it works like a charm. Thanks! – bill999 Jun 02 '14 at 03:08

1 Answers1

8

So if you don't pass an empty list to the DataFrame constructor then it works:

In [16]:

df = pd.DataFrame()
h = pd.Series(['g',4,2,1,1])
df = df.append(h,ignore_index=True)
df
Out[16]:
   0  1  2  3  4
0  g  4  2  1  1

[1 rows x 5 columns]

The difference between the two constructor approaches appears to be that the index dtypes are set differently, with an empty list it is an Int64 with nothing it is an object:

In [21]:

df = pd.DataFrame()
print(df.index.dtype)
df = pd.DataFrame([])
print(df.index.dtype)
object
int64

Unclear to me why the above should affect the behaviour (I'm guessing here).

UPDATE

After revisiting this I can confirm that this looks to me to be a bug in pandas version 0.12.0 as your original code works fine:

In [13]:

import pandas as pd
df = pd.DataFrame([])
h = pd.Series(['g',4,2,1,1])
df.append(h,ignore_index=True)

Out[13]:
   0  1  2  3  4
0  g  4  2  1  1

[1 rows x 5 columns]

I am running pandas 0.13.1 and numpy 1.8.1 64-bit using python 3.3.5.0 but I think the problem is pandas but I would upgrade both pandas and numpy to be safe, I don't think this is a 32 versus 64-bit python issue.

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 1
    I just tried it and for some reason, it still doesn't work even if I define the DataFrame like you say. – bill999 May 31 '14 at 22:28