1

Versions: Python 3.7.6, pandas 1.0.0

Input dataframe

df = pd.DataFrame(dict(
    recruit_dt=["1/1/2017"]*3+["1/1/2018"]*3+["1/1/2019"]*3,
    label = [1,3,4]*3,
    nmem = np.random.choice(list(range(10000,3000000)),9),
    pct_fem = np.random.sample(9),
    mean_age = 50 + 10*np.random.sample(9),
    sd_age = 8 + 2*np.random.sample(9)
))

Would like to present this after the following transformations

dfp = pd.pivot_table(df, values=["nmem","pct_fem","mean_age","sd_age"], index="recruit_dt", columns="label")
dfp = dfp.reindex(columns=['nmem', 'pct_fem', 'mean_age', 'sd_age'], level=0)

How do I write the styler so that all the nmem columns have thousand separators {:,}, 'pct_fem' are percentages to two decimal places, mean_age and sd_age are floating point numbers with two decimal places? Is there an approach which uses styler.format or styler.apply with IndexSlice?

== EDIT: this seems to work. Is there a more concise solution?

dfp.columns.names = ["metrics","label"]
dfp.style.format("{:,}", subset=pd.IndexSlice[:,'nmem']) \
         .format("{:.2%}", subset=pd.IndexSlice[:,'pct_fem']) \
         .format("{:.2f}", subset=pd.IndexSlice[:,['mean_age','sd_age']])
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
ironv
  • 978
  • 10
  • 25
  • 3
    Your edit is correct. Using `pd.IndexSlice` as you're using it seems like the best solution, as far as I can tell. Please post that as an answer. – filbranden Feb 05 '20 at 18:08
  • For me, this code does not alter the original format neither on Android with Pandas 1.0.1 nor on Windows 10 with Pandas 1.2.3 ! – Jean-Pierre Schnyder May 08 '21 at 22:32
  • What if I want to format all the `label=1` and `label=2` columns, regardless of the `metrics`? – data-monkey May 30 '23 at 11:15

2 Answers2

4

You can specify an argument to the subset parameter using a list comprehension to select the relevant columns.

>>> (dfp
     .style
     .format('{:.0f}', na_rep='-', subset=[col for col in dfp.columns if col[0] == 'nmen'])
     .format('{:.2%}', na_rep='-', subset=[col for col in dfp.columns if col[0] == 'pct_fem'])
     .format('{:,.2f}', na_rep='-', subset=[col for col in dfp.columns if col[0] in {'mean_age', 'sd_age'}])
)

enter image description here

A more general solution:

# Styles.
pct_two = '{:.2%}'
comma_float = '{:.0f}'
comma_float_2 = '{:.2f}'

# Styling to be applied to specified columns.
formats = {
    'nmean': comma_float,
    'pct_fem': pct_two,
    'mean_age': comma_float_2,
    'sd_age': comma_float_2,
}

# Create dictionary of multi-index columns with specified styling.
format_dict = {
    midx: formats[level_val]
    for level_val in formats
    for midx in [col for col in dfp if col[0] == level_val]
}

# Apply styling to dataframe.
dfp.style.format(format_dict)
Alexander
  • 105,104
  • 32
  • 201
  • 196
1

Let's try this:

idx = pd.IndexSlice
formatter_dict = {i:"{:,}" for i in dfp.loc[:, idx['nmem', :]].columns}
formatter_dict2 = {i:"{:.2%}" for i in dfp.loc[:, idx['pct_fem', :]].columns}
formatter_dict3 = {i:"{:.2f}" for i in dfp.loc[:, idx[['mean_age', 'sd_age'], :]].columns}
formatter_dict.update(formatter_dict2)
formatter_dict.update(formatter_dict3)
dfp.style.format(formatter_dict)

Output: enter image description here

Scott Boston
  • 147,308
  • 15
  • 139
  • 187