9

I need to create a DataFrame that contains columns of DataFrames. The DataFrames that go in the column have different sizes and I am getting a StopIteration exception. This doesn't happen, when the DataFrames are of the same size. I know a Panel is more suitable for this, but I need a DataFrame in this case.

a=pd.DataFrame({'cat1':['one','two','three'],'cat2':['four','five','six']})
b=pd.DataFrame({'cat1':['ten','eleven'],'cat2':['twelve','thirteen']})
pd.DataFrame({'col1':{'row1':a,'row2':b}})

If I remove the 'three' and 'six' items from 'cat1', 'cat2' respectively, then this works fine. Any idea how I can achieve this?

Beryllium
  • 12,808
  • 10
  • 56
  • 86
jorge.santos
  • 291
  • 1
  • 2
  • 4
  • I haven't seen a mention of a DataFrame of DataFrame's in pandas author's "Python for Data Analysis" book. What is your end goal please? – Maxim Egorushkin Jul 30 '13 at 18:55
  • I have a list of securities going down and bunch of fields going across. Some of this fields result in a table (ie holders list or dividend history) and I wanted to combine this with scalar values (price, pct change, name etc). I already have a panel view, but wanted to have a single view of the entire table. This is merely with the inention to be able to generalize the approach within the code, ie i can always take DF.ix['security','field'] regardless of the field shape. I guess the only right way is to do this with a panel[security][field]. I was just trying my luck for generalization. – jorge.santos Jul 30 '13 at 21:14

1 Answers1

6

this is not a good idea, you lose all efficiency because things are treated as object dtype and operations will be quite slow (as operations cannot be done via c-level base types, like float/int). Better is to use a multi-level index, which can easily encompass what I think you want

In [20]: a
Out[20]: 
    cat1  cat2
0    one  four
1    two  five
2  three   six

In [21]: b
Out[21]: 
     cat1      cat2
0     ten    twelve
1  eleven  thirteen

In [22]: pd.concat([ a, b ], keys={ 'row1' : a, 'row2' : b })
Out[22]: 
          cat1      cat2
row1 0     one      four
     1     two      five
     2   three       six
row2 0     ten    twelve
     1  eleven  thirteen
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • There's also the option to create a hierarchically indexed `DataFrame` using `Panel.to_frame(filter_observations=False)`. – Phillip Cloud Jul 30 '13 at 19:59
  • Thank you Jeff. The idea of doing this is because I need to combine these data frames with another bunch of scalar values. For example row1: DF_a, np.nan, 104, 105 | row2: np.nan, DF_b, 234, 213. This assuming i have the columns Cat1, Cat2, Scalar1, Scalar2. i guess this still possible using the multi-index approach, would i just need to broadcast the scalar value across all items of cat1/cat2? thanks again – jorge.santos Jul 30 '13 at 21:06
  • you don't need to be that fancy ``df['scalar1'] = 234`` will work – Jeff Jul 31 '13 at 00:42