0

I am generating an email report using pandas –to_html function. I am looking for solutions and struck at a point(screenshot attached) where I want to combine cell values in the dataframe(just like excel merge and center) for certain columns. Could you please help me in this if you have come across this scenario earlier in your experience. I want to achieve this using only pandas and no external libraries.

What I have in the dataframe(columns in order – category, company, sales, 2019, 2020, LTM, Q1 20, Q1 21, YoY Change (%)):

enter image description here

What we need in output: enter image description here

I have tried using MultiIndex(below is the code) however it does not exactly give me what I need. Any help would be appreciated.

email_report = (
        df_category.set_index(
            [
                "category",
                "company",
                "previous_year",
                "current_year",
                "ltm_share",
                "previous_value",
                "current_value",
                "change",
            ]
        )
        .to_html()
    )
Prakhar Jhudele
  • 955
  • 1
  • 7
  • 14

1 Answers1

1

I believe pandas does cell merging for multi-indexes out of the box, however not for data cells. You might get away with it using CSS tricks.

I’m using a 2-level index dataframe with random values and your column names + a no merge column:

     previous_year  no merge  current_year  ltm_share  previous_value  current_value    change
a 1       0.538438  0.197967      0.158720   0.031351        0.180214       0.888741  0.132500
  2       0.966025  0.363504      0.071190   0.503113        0.132445       0.883562  0.461739
  3       0.226929  0.913076      0.570731   0.521068        0.776050       0.996729  0.040835
b 1       0.327364  0.274166      0.789224   0.030502        0.508330       0.091049  0.497796
  2       0.041149  0.403038      0.924517   0.271489        0.692771       0.003774  0.391067
c 1       0.260083  0.873030      0.658576   0.983804        0.736934       0.970065  0.162908

These are the ideas on which the solution works:

  • use grid layout on the table
  • use display: contents on all intermediate tags (thead, tbody, tr). It should be well support by now
  • hide the cells below row spans with display: none
  • make cells span several rows with grid-row: span N
  • rowspan attributes are not recognized anymore, so index levels need to be treated as any other columns.

I’m using a 2-level index dataframe with random values and your column names + a no merge column:

     previous_year  no merge  current_year  ltm_share  previous_value  current_value    change
a 1       0.538438  0.197967      0.158720   0.031351        0.180214       0.888741  0.132500
  2       0.966025  0.363504      0.071190   0.503113        0.132445       0.883562  0.461739
  3       0.226929  0.913076      0.570731   0.521068        0.776050       0.996729  0.040835
b 1       0.327364  0.274166      0.789224   0.030502        0.508330       0.091049  0.497796
  2       0.041149  0.403038      0.924517   0.271489        0.692771       0.003774  0.391067
c 1       0.260083  0.873030      0.658576   0.983804        0.736934       0.970065  0.162908

Here’s the code:

hide_cells = df.index.get_level_values(0).duplicated(keep='first')
multiples = df.index.get_level_values(0).value_counts()

style = df.reset_index().style.hide_index().set_table_styles([
    {'selector': '', 'props': [
        ('display', 'grid'),
        ('grid-template-columns', f'repeat({len(df.columns) + df.index.nlevels}, auto)'),
    ]},
    {'selector': 'thead, tbody, tr', 'props': [
        ('display', 'contents'),
    ]},
])
style = style.set_properties(subset=pd.IndexSlice[hide_cells, ['level_0', *merge_cols]], display='none')

for height in multiples.unique():
    if height <= 1:
        continue
    mask = ~hide_cells & df.index.get_level_values(0).isin(multiples.index[multiples.eq(height)])
    style = style.set_properties(subset=pd.IndexSlice[mask, ['level_0', *merge_cols]], **{'grid-row':  f'span {height}'})

with open('out.html', 'w') as f:
    f.write(style.render())

And the result: enter image description here


This is my initial answer, relies on older CSS capabilities, but only works if no columns are next to each other. It uses

  • fixed row heights in your table
  • make the first cell of a multi-index row bigger (with position:absolute to avoid changing the table row, and line-height to vertically align text)
  • hide the cells below it with visibility:hidden
hide_cells = df.index.get_level_values(0).duplicated(keep='first')
multiples = df.index.get_level_values(0).value_counts()

style = df.style.set_properties(height='1.5em')

style = style.set_properties(subset=pd.IndexSlice[hide_cells, merge_cols], visibility='hidden')
for height in multiples.unique():
    if height <= 1:
        continue
    mask = ~hide_cells & df.index.get_level_values(0).isin(multiples.index[multiples.eq(height)])
    style = style.set_properties(subset=pd.IndexSlice[mask, merge_cols], **{'position': 'absolute', 'line-height': f'{height * 1.5}em'})

with open('out.html', 'w') as f:
    f.write(style.render())

This works vertically but messes up the horizontal alignment when there are several merged columns next to each other:

enter image description here

The way around is to manually specify fixed positions for columns, I suppose.

Cimbali
  • 11,012
  • 1
  • 39
  • 68