How do I create pandas DataFrame (with index or multiindex) from list of namedtuple instances?

Question

Simple example:

from collections import namedtuple
import pandas

Price = namedtuple('Price', 'ticker date price')
a = Price('GE', '2010-01-01', 30.00)
b = Price('GE', '2010-01-02', 31.00)
l = [a, b]
df = pandas.DataFrame.from_records(l, index='ticker')
Traceback (most recent call last)
...
KeyError: 'ticker'

Harder example:

df2 = pandas.DataFrame.from_records(l, index=['ticker', 'date'])
df2

         0           1   2
ticker  GE  2010-01-01  30
date    GE  2010-01-02  31

Now it thinks that ['ticker', 'date'] is the index itself, rather than the columns I want to use as the index.

Is there a way to do this without resorting to an intermediate numpy ndarray or using set_index after the fact?

Andy Hayden · Accepted Answer · 2013-06-09T11:23:26.057

30

To get a Series from a namedtuple you could use the _fields attribute:

In [11]: pd.Series(a, a._fields)
Out[11]:
ticker            GE
date      2010-01-01
price             30
dtype: object

Similarly you can create a DataFrame like this:

In [12]: df = pd.DataFrame(l, columns=l[0]._fields)

In [13]: df
Out[13]:
  ticker        date  price
0     GE  2010-01-01     30
1     GE  2010-01-02     31

You have to set_index after the fact, but you can do this inplace:

In [14]: df.set_index(['ticker', 'date'], inplace=True)

In [15]: df
Out[15]:
                   price
ticker date
GE     2010-01-01     30
       2010-01-02     31

edited Jun 09 '13 at 11:23

answered Jun 09 '13 at 00:19

Andy Hayden

359,921
101
625
535

Clever. I hadn't realised there was a _fields attribute on namedtuples. Might be worthwhile opening a ticket to support constructing namedtuples in the same way as dictionaries work now. – Matti John Jun 09 '13 at 00:23
I've something together, but tbh I think this method is probably good for most use cases... – Andy Hayden Jun 09 '13 at 02:09
I don't think there's any getting around the `set_index`, but you can do this inplace. – Andy Hayden Jun 09 '13 at 11:24
columns=Price._fields would be clearer. _fields is an attribute of the class, although Python allows accessing it through an instance as l[0]._fields or a._fields. – hwrd Feb 19 '19 at 15:31
Also, it can be done without using the ._fields attribute at all: fields = ['ticker', 'date', 'price'] Price = namedtuple('Price', fields) ... df = pd.DataFrame(l, columns=fields) – hwrd Feb 19 '19 at 15:43
@hwrd only if you know what the fields are! If you're handed a list of namedtuples you may not know (eg if that's returned from some other lib) – Andy Hayden Feb 19 '19 at 16:52
@AndyHayden True, but the OP had just created the namedtuple, using a single string of text instead of a list of fields. Although convenient I've always found the implicitly split string confusing. – hwrd Feb 20 '19 at 17:21
@hwrd true, in OPs case, but Google brings you here and others may not have it to hand. – Andy Hayden Feb 20 '19 at 17:27

score 0 · Answer 2 · answered Apr 07 '23 at 21:44

Calling the DataFrame constructor on the list of namedtuples produce a dataframe:

df = pd.DataFrame(l)


   ticker        date  price
0      GE  2010-01-01   30.0
1      GE  2010-01-02   31.0

Calling set_index() on the result produces the desired output. However, since OP doesn't want that, another way could be to convert each namedtuple into a dictionary and pop keys.

l_asdict = [x._asdict() for x in l]
df = pd.DataFrame(l_asdict, index=pd.MultiIndex.from_arrays([[x.pop(k) for x in l_asdict] for k in ['ticker', 'date']], names=['ticker', 'date']))


                    price
ticker  date    
    GE  2010-01-01   30.0
        2010-01-02   31.0

How do I create pandas DataFrame (with index or multiindex) from list of namedtuple instances?

2 Answers2

Linked