0

TLDR: I want to create an interactive visualization with Bokeh where I can toggle the appearance of individual bars in a bar plot based on the values of multiple categorical dataframe columns.

The data

I have a Pandas dataframe with 5 columns. One column contains sample ID numbers (x), one column contains quantitative output data (y), and the other three have categorical data used to classify each sample as big or small, A or B, and blue or red.

data = dict(size=['big', 'big', 'big', 'big', 'small', 'small', 'small', 'small'],
            design=['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'],
            color=['blue', 'red', 'blue', 'red', 'blue', 'red', 'blue', 'red'],
            x=['1', '2', '3', '4', '5', '6', '7', '8'],
            y=['10', '20', '10', '30', '10', '40', '10', '30'])
data = pd.DataFrame(data)
print(data)

Output:

    size design color  x   y
0    big      A  blue  1  10
1    big      A   red  2  20
2    big      B  blue  3  10
3    big      B   red  4  30
4  small      A  blue  5  10
5  small      A   red  6  40
6  small      B  blue  7  10
7  small      B   red  8  30

The problem

I want to plot the above data as a bar graph, with the x values plotted along the x axis, and the y values plotted along the y axis.

Data from the above dataframe plotted as a bar graph

I also want to toggle the appearance of the bars using something like Bokeh's CheckboxGroup, so that there is a total of 6 selectable checkboxes, one for each of the values in the three categorical columns (big, small, A, B, blue, and red). If all boxes are checked, all bars would be shown. If all but the A boxes are checked, then only half the data is shown (only the half with design value B). If all but the A and blue boxes are checked, none of the data with design value A or color value blue will be shown in the bar plot.

The solution posted to this older StackOveflow question is close to what I want to achieve. However, unlike the dataframe described in the linked post, which only had 3 columns (an X column, a Y column, and a single categorical column which was tied to the Bokeh CheckboxGroup), I have 5 columns, 3 of which are categorical columns that I want tied to selectable checkboxes.

I am not very familiar with JavaScript, so I'm not sure how I could achieve what I am describing with Bokeh.

Rory McGuire
  • 151
  • 9

1 Answers1

2

The solution below is based on the simpler check boxes for lines example.

Explenation

Each renderer in bokeh has the attribute visible which is by default True. To hide or show each bar by his own, we need a renderer for each bar. Therefor we have to loop over the rows of the DataFrame.

Inside the JavaScript part we set all bars to visible by default. This is Ture if all boxes are active. Then we remove the bars which are inactive if a Checkbox is not active. The logic is coded by hand and takes the index of the cases from the DataFrame.

The last step is, to set the visible attribute.

Example Code

import pandas as pd
from bokeh.plotting import show, figure, output_notebook
from bokeh.models import CheckboxGroup, CustomJS
from bokeh.layouts import row
output_notebook()

data = dict(size=['big', 'big', 'big', 'big', 'small', 'small', 'small', 'small'],
            design=['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'],
            color=['blue', 'red', 'blue', 'red', 'blue', 'red', 'blue', 'red'],
            x=['1', '2', '3', '4', '5', '6', '7', '8'],
            y=['10', '20', '10', '30', '10', '40', '10', '30'])

df = pd.DataFrame(data)
df['x'] = df['x'].astype(float)
df['y'] = df['y'].astype(float)

# get a dict for unique deciders by decider columns
selections = {k: list(df[k].unique()) for k in ['size','design','color']}

# names an indexes for the names are both collected as lists
names = []
indexes = []
for col, items in selections.items():
    names += items
    for item in items:
        indexes.append(list(df[df[col]==item].index))

p=figure(width=300, height=300)
bar_renderer = []
for i, item in df.iterrows():
    bar_renderer.append(
        p.vbar(x=item['x'], top=item['y'], width=0.7, color=item['color'])
    )

checkbox = CheckboxGroup(labels=names, active=list(range(len(names))), width=100)
callback = CustomJS(args=dict(bars=bar_renderer,checkbox=checkbox, indexes=indexes),
    code="""
    function removeItems(arr, values){
      for (let value of values){
        const index = arr.indexOf(value);
        if (index > -1) {
          arr.splice(index, 1);
        }
      }
      return arr;
    }
    // initalize all bars as active
    let active = [...Array(bars.length).keys()];

    // loop over all checkboxes, remove indexes from active
    // if checkbox is inactive
    for(var i=0; i<checkbox.active.length; i++){
        if (!checkbox.active.includes(i)){
            active = removeItems(active, indexes[i])
        }
    }
    // set bar to visible if value is in active, else invisible
    for(var i=0; i<bars.length; i++){
        bars[i].visible = active.includes(i);
    }
    """
)
checkbox.js_on_change('active', callback)
show(row(p,checkbox))

Output

bar plot with toggled visibility

mosc9575
  • 5,618
  • 2
  • 9
  • 32
  • What if some of the values are strings of integers? If I try the above code, replacing all instances of `A` with `1`, and `B` with `2`, the plot is still generated but unchecking a box no longer results in the corresponding bar disappearing. – Rory McGuire Sep 08 '22 at 20:16
  • 1
    In general this could work. But as I mentioned, the logic of the disappearing bars is coded by hand based on the index in the DataFrame. There is not "creation" of the JS part. Of course, this could be done, but was way out of my scope. – mosc9575 Sep 08 '22 at 20:55
  • 1
    @RoryMcGuire I have updated my answer. The solution generates the same figure, but the approche now findes the indexes by itseld. There is no need to code parts of the JS by hand. Hope this will fit to your modifications. – mosc9575 Sep 09 '22 at 15:57