Print dataset from box in barplot

Question

I have created a bar plot from a dataset, which has stacked bars of different values. What I want is that when I click on a box of a bar, to print the dataset that this box is coming from, not the whole dataset. For example in this code

import pandas as pd
import matplotlib.pyplot as plt
from itertools import product

codes = ['A', 'B', 'C', 'D']
months = ['January', 'February', 'March']
years = [2022, 2023]
rows = list(product(codes, months, years))
df = pd.DataFrame(rows, columns=['problem_code', 'month', 'year']).sample(n=20, random_state=42)

month_counts = df.groupby(['month','problem_code']).size().unstack(fill_value=0)
year_counts = df.groupby(['year','problem_code']).size().unstack(fill_value=0)

fig, (ax1, ax2) = plt.subplots(1, 2)
colors = ['blue', 'red', 'green', 'yellow']

# Plot the data for the months
bars1 = month_counts.plot(kind='bar', ax=ax1, stacked=True, color=colors)
# Plot the data for the years
bars2 = year_counts.plot(kind='bar', ax=ax2, stacked=True, color=colors)
plt.show()
print(month_counts)

when clicking on the box corresponding to January-problem code 'A' to show all the rows of the dataset with January as a month with problem code 'A', not the whole dataset. Same for year. Is this possible?

I tried using mplcursors but it doesn't seem to be able to distinct which one of the plots I am clicking on, so if for example I print the index of a box I can get index 1 from both plots.

score 0 · Answer 1 · answered Mar 20 '23 at 11:51

With mplcursors, you can set a custom annotation function. That function get the selected object as parameter. In the case of a stacked bar plot, the selected object will be the container containing the clicked-on bar. The bars are grouped into containers depending on their color, depending on the column name in the dataframe. The column name will be used as label of the container.

The selection also has an index field which can be used to find the bar. The bar is represented as a rectangle. Its height corresponds to the represented value. The indexfield can further be used to get the x tick label of the bar.

The subplot (ax) can be retrieved via the axes field of the first bar in the container. If needed, a test such as if ax == ax1: can be used to find out which subplot has been clicked on.

import matplotlib
import matplotlib.pyplot as plt
from itertools import product
import pandas as pd
import mplcursors

def annotation_func(sel):
    if type(sel.artist) == matplotlib.container.BarContainer:
        code = sel.artist.get_label()  # one of 'A', 'B', 'C' or 'D'
        bar_index = sel.index
        bar = sel.artist[bar_index]
        h = bar.get_height()
        x, y = bar.get_xy()
        ax = sel.artist[bar_index].axes  # find out the subplot
        x_name = ax.get_xticklabels()[bar_index].get_text()
        sel.annotation.set_text(f'{ax.get_xlabel()}: {x_name}\nCode: {code}\nValue: {h:g}')
        sel.annotation.xy[1] = y + h / 2  # point the annotation to the center of the bar

codes = ['A', 'B', 'C', 'D']
months = ['January', 'February', 'March']
years = [2022, 2023]
rows = list(product(codes, months, years))
df = pd.DataFrame(rows, columns=['problem_code', 'month', 'year']).sample(n=20, random_state=42)

month_counts = df.groupby(['month', 'problem_code']).size().unstack(fill_value=0)
year_counts = df.groupby(['year', 'problem_code']).size().unstack(fill_value=0)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
colors = ['blue', 'red', 'green', 'yellow']

# Plot the data for the months
month_counts.plot(kind='bar', ax=ax1, stacked=True, color=colors, rot=0)
# Plot the data for the years
year_counts.plot(kind='bar', ax=ax2, stacked=True, color=colors, rot=0)

cursor = mplcursors.cursor(hover=True)
cursor.connect('add', annotation_func)

plt.tight_layout()
plt.show()

Print dataset from box in barplot

1 Answers1