-2

I have this excel data:

        x1         x2         x3
id    a    b     a    b     a   b
foo   1    2     3    4     2   4

Column x1, x2, and x3 are made for both a and b and there are possibility where the number of x will keep increasing, so before going to the database I decided to transform the data into:

id      category   type    values
foo         x1      a        1
foo         x1      b        2
foo         x2      a        3
foo         x2      b        4
foo         x3      a        2
foo         x3      b        4

how can I transform my dataframe with pandas? thanks in advance

yangyang
  • 491
  • 4
  • 16

1 Answers1

2

If there is MultiIndex in columns use DataFrame.unstack:

print (df.columns)
MultiIndex([('x1', 'a'),
            ('x1', 'b'),
            ('x2', 'a'),
            ('x2', 'b'),
            ('x3', 'a'),
            ('x3', 'b')],
           )


df = (df.unstack()
        .reorder_levels([2,0,1])
        .rename_axis(['id','category','type'])
        .reset_index(name='values'))
print (df)
    id category type  values
0  foo       x1    a       1
1  foo       x1    b       2
2  foo       x2    a       3
3  foo       x2    b       4
4  foo       x3    a       2
5  foo       x3    b       4

Solution with DataFrame.stack by both levels:

df = df.stack([0,1]).rename_axis(['id','category','type']).reset_index(name='values')
print (df)
    id category type  values
0  foo       x1    a       1
1  foo       x1    b       2
2  foo       x2    a       3
3  foo       x2    b       4
4  foo       x3    a       2
5  foo       x3    b       4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252