Converting to long panel data format with pandas

Question

I have a DataFrame where rows represent time and columns represent individuals. I want to turn it into into long panel data format in pandas in an efficient manner, as the DataFames are rather large. I would like to avoid looping. Here is an example: The following DataFrame:

      id    1    2
date              
20150520  3.0  4.0
20150521  5.0  6.0

should be transformed into:

date        id        value
20150520    1         3.0
20150520    2         4.0
20150520    1         5.0
20150520    2         6.0

Speed is what's really important to me, due to the data size. I prefer it over elegance if there is a tradeoff. Although I suspect I mam missing a rather simple function, pandas should be able to handle that. Any suggestions?

score 3 · Accepted Answer · answered Nov 07 '16 at 14:38

I think you need stack with reset_index:

print (df)
            1    2
date              
20150520  3.0  4.0
20150521  5.0  6.0

df = df.stack().reset_index()
df.columns = ['date','id','value']
print (df)
       date id  value
0  20150520  1    3.0
1  20150520  2    4.0
2  20150521  1    5.0
3  20150521  2    6.0

print (df)
id          1    2
date              
20150520  3.0  4.0
20150521  5.0  6.0

df = df.stack().reset_index(name='value')
print (df)
       date id  value
0  20150520  1    3.0
1  20150520  2    4.0
2  20150521  1    5.0
3  20150521  2    6.0

This is both correct and faster than the other proposed soution — splinter, Nov 07 '16 at 17:49

piRSquared · Answer 2 · 2016-11-07T18:04:23.320

2

using melt

pd.melt(df.reset_index(),
        id_vars='date',
        value_vars=['1', '2'],
        var_name='Id')

EDIT:
Because OP wants fast ;-)

def pir(df):
    dv = df.values
    iv = df.index.values
    cv = df.columns.values
    rc, cc = df.shape
    return pd.DataFrame(
        dict(value=dv.flatten(),
             id=np.tile(cv, rc)),
        np.repeat(iv, cc))

edited Nov 07 '16 at 18:04

answered Nov 07 '16 at 15:00

piRSquared

285,575
57
475
624

Thanks@piRSquared, this indeed works but the accepted solution is 1.25 times faster. – splinter Nov 07 '16 at 17:48
@jezrael's answer is what I'd recommend. Sometimes I'll post an alternative solution because it may still be useful for others... or even you. I find other poster's solutions invaluable. pandas/numpy often has many solutions, some of which are faster in some circumstances and slower in others. – piRSquared Nov 07 '16 at 17:52
Makes perfect sense, much appreciated I will keep this alternative option in mind – splinter Nov 07 '16 at 17:55
@splinter I've update post with faster solution. Keep in mind, you will almost always get a different style of response if you state in your question that speed/efficiency is your goal. – piRSquared Nov 07 '16 at 18:05

score 1 · Answer 3 · answered Nov 07 '16 at 14:37

1

the function you are looking for is

df.reset_index()

you can then rename your columns using

df.columns = ['date', 'id', 'value']

answered Nov 07 '16 at 14:37

Jack Cooper

408
1
4
10

That wouldn't work. Apparently I also need the stack function as mentioned above. Thanks though. – splinter Nov 07 '16 at 14:41

Converting to long panel data format with pandas

3 Answers3