Pandas MultiIndex with to_html

Question

I am generating an email report using pandas –to_html function. I am looking for solutions and struck at a point(screenshot attached) where I want to combine cell values in the dataframe(just like excel merge and center) for certain columns. Could you please help me in this if you have come across this scenario earlier in your experience. I want to achieve this using only pandas and no external libraries.

What I have in the dataframe(columns in order – category, company, sales, 2019, 2020, LTM, Q1 20, Q1 21, YoY Change (%)):

What we need in output:

I have tried using MultiIndex(below is the code) however it does not exactly give me what I need. Any help would be appreciated.

email_report = (
        df_category.set_index(
            [
                "category",
                "company",
                "previous_year",
                "current_year",
                "ltm_share",
                "previous_value",
                "current_value",
                "change",
            ]
        )
        .to_html()
    )

Cimbali · Accepted Answer · 2021-06-15T12:42:28.827

I believe pandas does cell merging for multi-indexes out of the box, however not for data cells. You might get away with it using CSS tricks.

I’m using a 2-level index dataframe with random values and your column names + a no merge column:

     previous_year  no merge  current_year  ltm_share  previous_value  current_value    change
a 1       0.538438  0.197967      0.158720   0.031351        0.180214       0.888741  0.132500
  2       0.966025  0.363504      0.071190   0.503113        0.132445       0.883562  0.461739
  3       0.226929  0.913076      0.570731   0.521068        0.776050       0.996729  0.040835
b 1       0.327364  0.274166      0.789224   0.030502        0.508330       0.091049  0.497796
  2       0.041149  0.403038      0.924517   0.271489        0.692771       0.003774  0.391067
c 1       0.260083  0.873030      0.658576   0.983804        0.736934       0.970065  0.162908

These are the ideas on which the solution works:

use grid layout on the table
use display: contents on all intermediate tags (thead, tbody, tr). It should be well support by now
hide the cells below row spans with display: none
make cells span several rows with grid-row: span N
rowspan attributes are not recognized anymore, so index levels need to be treated as any other columns.

I’m using a 2-level index dataframe with random values and your column names + a no merge column:

     previous_year  no merge  current_year  ltm_share  previous_value  current_value    change
a 1       0.538438  0.197967      0.158720   0.031351        0.180214       0.888741  0.132500
  2       0.966025  0.363504      0.071190   0.503113        0.132445       0.883562  0.461739
  3       0.226929  0.913076      0.570731   0.521068        0.776050       0.996729  0.040835
b 1       0.327364  0.274166      0.789224   0.030502        0.508330       0.091049  0.497796
  2       0.041149  0.403038      0.924517   0.271489        0.692771       0.003774  0.391067
c 1       0.260083  0.873030      0.658576   0.983804        0.736934       0.970065  0.162908

Here’s the code:

hide_cells = df.index.get_level_values(0).duplicated(keep='first')
multiples = df.index.get_level_values(0).value_counts()

style = df.reset_index().style.hide_index().set_table_styles([
    {'selector': '', 'props': [
        ('display', 'grid'),
        ('grid-template-columns', f'repeat({len(df.columns) + df.index.nlevels}, auto)'),
    ]},
    {'selector': 'thead, tbody, tr', 'props': [
        ('display', 'contents'),
    ]},
])
style = style.set_properties(subset=pd.IndexSlice[hide_cells, ['level_0', *merge_cols]], display='none')

for height in multiples.unique():
    if height <= 1:
        continue
    mask = ~hide_cells & df.index.get_level_values(0).isin(multiples.index[multiples.eq(height)])
    style = style.set_properties(subset=pd.IndexSlice[mask, ['level_0', *merge_cols]], **{'grid-row':  f'span {height}'})

with open('out.html', 'w') as f:
    f.write(style.render())

And the result:

This is my initial answer, relies on older CSS capabilities, but only works if no columns are next to each other. It uses

fixed row heights in your table
make the first cell of a multi-index row bigger (with position:absolute to avoid changing the table row, and line-height to vertically align text)
hide the cells below it with visibility:hidden

hide_cells = df.index.get_level_values(0).duplicated(keep='first')
multiples = df.index.get_level_values(0).value_counts()

style = df.style.set_properties(height='1.5em')

style = style.set_properties(subset=pd.IndexSlice[hide_cells, merge_cols], visibility='hidden')
for height in multiples.unique():
    if height <= 1:
        continue
    mask = ~hide_cells & df.index.get_level_values(0).isin(multiples.index[multiples.eq(height)])
    style = style.set_properties(subset=pd.IndexSlice[mask, merge_cols], **{'position': 'absolute', 'line-height': f'{height * 1.5}em'})

with open('out.html', 'w') as f:
    f.write(style.render())

This works vertically but messes up the horizontal alignment when there are several merged columns next to each other:

The way around is to manually specify fixed positions for columns, I suppose.

thanks for the detailed answer. I will try out the options you suggested — Prakhar Jhudele, Jun 15 '21 at 09:46
@PrakharJhudele I’ve updated for a better solution with grid layouts. — Cimbali, Jun 15 '21 at 12:41

Pandas MultiIndex with to_html

1 Answers1