1

At some point in my workflow I end up with a regular pandas DataFrame with some columns and some rows. I want to export this DataFrame into a latex table,using df.to_latex(). This worked great, however, I know want to use multicolumn where some columns are part of a multi table. For instance a DataFrame with columns a,b,c,d,e I would want to leave column a as it is, but group up b and c, as well as d and e.

import numpy as np
import pandas as pd

# where I am
data = np.arange(15).reshape(3, 5)
df = pd.DataFrame(data=data, columns=['a', 'b', 'c', 'd', 'e'])

It looks like this:

In [161]: df
Out[161]: 
    a   b   c   d   e
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14

I would like to group columns b and c, as well as d and e, but leave a alone. So my desired output should look like this.

# where I want to be: leave column 'a' alone, group b&c as well as d&e
multi_index = pd.MultiIndex.from_tuples([
    ('a', ''),
    ('bc', 'b'),
    ('bc', 'c'),
    ('de', 'd'),
    ('de', 'e'),
])
desired = pd.DataFrame(data, columns=multi_index)

It looks like this:

In [162]: desired
Out[162]: 
    a  bc      de    
        b   c   d   e
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14

In order to get there, i tried a simple reindex. This give me the desired shape, but all columns only got NaN as value.

# how can use df and my multiindexreindex to multi column DataFrame
result = df.reindex(columns=multi_index)

The result looks like described, correct indices but all NaN

In [166]: result
Out[166]: 
    a  bc      de    
        b   c   e   e
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN

How can I get my desired result?

Nras
  • 4,251
  • 3
  • 25
  • 37
  • 1
    `reindex` will not work. It takes the current frame, with the current indexes and arranges it such that the new frame will have indexes in order of the passed index. It does not magically align from a single index to a multiindex. You get all NaN because there is no data at location `[0, (a, None)]` or `[0, (bc, b)]` (etc.) in the original frame. I don't see what is wrong with `pd.DataFrame(data, columns=multi_index)`. If not, there is always the venerable `groupby`... – Kartik Sep 23 '16 at 14:47
  • @Kartik TY for the input. I cannot simpy do the ``pd.DataFrame(data, columns=multi_index)`` as at i only have the ``df``, not the ``data`` itself. It was just code to show the desired result, where i want to get to. – Nras Sep 23 '16 at 14:58

2 Answers2

1

You can assign the multiIndex to the columns attribute of the data frame directly:

df.columns = multi_index
df

enter image description here

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • unbelievable... thank you, I was quite sure it had to be something simple as this. May I ask, how you created this representation of the DataFrame? – Nras Sep 23 '16 at 15:16
  • Do you mean the screen shot? It's usually better not to use image when asking questions because it makes it harder to generate the data frame for answerers. – Psidom Sep 23 '16 at 15:18
1
pd.concat([df.set_index('a')[['b', 'c']],
           df.set_index('a')[['d', 'e']]],
          axis=1, keys=['bc', 'de']).reset_index(col_level=1)

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • This is almost correct, it does put a to the sublevel. I could work with that too, though. – Nras Sep 23 '16 at 15:17
  • @Nras interesting, I could swear I saw that `a` in the sub-level, that's why I put it there with `col_level=1`. Remove that argument to get it where you want. – piRSquared Sep 23 '16 at 15:20