4

This question is about how to do conditional formatting in Plotly.

Instances where this might be needed:

  • Scatter plots where points need to be colored (i.e. rainbow) as a function of 2 variable;
  • Interactive charts where the coloring depends on the parameter values;
  • Histograms, where parts of it need to be colored differently.

Here I will ask specifically about histograms.

Take the following data:

data = np.random.normal(size=1000)

I want to have a histogram where values higher that 0 are binned under a different color.

A simple solution is to

hist1 = go.Histogram(x=data[data<0], 
                    opacity=0.75, 
                    histnorm='density',
                    showlegend=False,
                    )
hist2 = go.Histogram(x=data[data>=0], 
                    opacity=0.75, 
                    histnorm='density',
                    showlegend=False,
                    )
layout = go.Layout(barmode='overlay')
fig = go.Figure(data=[hist1, hist2], layout=layout)
iplot(fig, show_link=False)

enter image description here

There are several problems with this solution:

  1. The default bin sizes are different for the 2 histograms, causing overlapping around zero.
  2. If I want to have histnorm = 'probability density' the resulting plots "normalize" each of the separate histograms, so they will look disproportionate.
  3. Binning starts from left for both histograms and so the last bin may go beyond for the histogram of the values below zero.

Is there a better way to do this?


UPDATE

OK, I can solve (1) and (3) using xbins:

hist1 = go.Histogram(x=data[data>=0], 
                    opacity=0.75, 
                    xbins=dict(
                        start=0,
                        end=4,
                        size=0.12),
                    histnorm='density',
                    showlegend=False,
                    )
hist2 = go.Histogram(x=data[data<0], 
                    opacity=0.75, 
                    xbins=dict(
                        start=-0.12*33,
                        end=0,
                        size=0.12),
                    histnorm='density',
                    showlegend=False,
                    )
layout = go.Layout(barmode='overlay')
fig = go.Figure(data=[hist1, hist2], layout=layout)
iplot(fig, show_link=False)

enter image description here

But, how do I solve the second issue?

vestland
  • 55,229
  • 37
  • 187
  • 305
Sandu Ursu
  • 1,181
  • 1
  • 18
  • 28

1 Answers1

1

For the...

If I want to have histnorm = 'probability density' the resulting plots "normalize" each of the separate histograms, so they will look disproportionate.

... part it seems you will have to normalize the entire sample before you split it in two different histograms. This means that what you should do is to make an area chart with multiple colors under a single trace. But the suggested solution to this unfortunately seems to be to assign different colors to two traces with...

df_pos = df.where(df < 0, 0)
df_neg = df.where(df > 0, 0)

... which of course brings you right back to where you are.

So in order to get what you want, it seems you'll have to free yourself from the boundaries of gi.Histogram, sort out the binning and normalization first, and then use a combination of area charts or a bar chart. To my understanding, this will take care of all three bullet points. Here's a suggestion on how to do that:

Plot:

enter image description here

Code:

# imports
import plotly.graph_objects as go
from plotly.offline import iplot
import pandas as pd
import numpy as np

# theme
import plotly.io as pio
#pio.templates
#pio.templates.default = "plotly_white"
pio.templates.default = "none"

# Some sample data
np.random.seed(123)
x = np.random.normal(0, 1, 1000)

# numpy binning
binned = np.histogram(x, bins=30, density=True)

# retain some info abou the binning
yvals=binned[0]
x_last = binned[1][-1]
xvals=binned[1][:-1]

# organize binned data in a pandas dataframe
df_bin=pd.DataFrame(dict(x=xvals, y=yvals))
df_bin_neg = df.where(df['x'] < 0)
df_bin_pos = df.where(df['x'] > 0)

# set up plotly figure
fig=go.Figure()

# neagtive x
fig.add_trace(go.Scatter(
    x=df_bin_neg['x'],
    y=df_bin_neg['y'],
    name="negative X",
    hoverinfo='all',
    fill='tozerox',
    #fillcolor='#ff7f0e',
    fillcolor='rgba(255, 103, 0, 0.7)',

    line=dict(color = 'rgba(0, 0, 0, 0)', shape='hvh')
))

# positive x
fig.add_trace(go.Scatter(
    x=df_bin_pos['x'],
    y=df_bin_pos['y'],
    name="positive X",
    hoverinfo='all',
    fill='tozerox',
    #opacity=0.2,
    #fillcolor='#ff7f0e',
    #fillcolor='#1f77b4',
    fillcolor='rgba(131, 149, 193, 0.9)',
    line=dict(color = 'rgba(0, 0, 0, 0)', shape='hvh')
))

# adjust layout to insure max values are included
ymax = np.max([df_bin_neg['y'].max(), df_bin_neg['y'].max()])
fig.update_layout(yaxis=dict(range=[0,ymax+0.1]))

# adjust layout to match OPs original
fig.update_xaxes(showline=True, linewidth=1, linecolor='black', mirror=False, zeroline=False, showgrid=False)
fig.update_yaxes(showline=False)#, linewidth=2, linecolor='black', mirror=True)

fig.show()
vestland
  • 55,229
  • 37
  • 187
  • 305