1

I'm using holoviews with bokeh backend for interactive visualizations. I have a histogram with edges and frequency data. What is an elegant way of overlaying my histogram with the cumulative distribution (cdf) curve?

I tried using the cumsum option in hv.dim but don't think i'm doing it right. The help simply says,

Help on function cumsum in module holoviews.util.transform:
cumsum(self, **kwargs)

My code looks something like,

df_hist = pd.DataFrame(columns=['edges', 'freq'])
df_hist['edges'] = [-2, -1, 0, 1, 2]
df_hist['freq'] = [1, 3, 5, 3, 1]

hv.Histogram((df_hist.edges, df_hist.freq))

The result is a histogram plot.

Is there something like a...

hv.Histogram((df_hist.edges, df_hist.freq), type='cdf') ... to show the cumulative distribution?

Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96

1 Answers1

2

One possible solution is by using histogram(cumulative=True) as follows:

from holoviews.operation import histogram

histogram(hv.Histogram((df_hist.edges, df_hist.freq)), cumulative=True)

More info on transforming elements here:
http://holoviews.org/user_guide/Transforming_Elements.html


Or a more general solution by turning the original data into a hv.Dataset():

import holoviews as hv
import seaborn as sns
hv.extension('bokeh')

iris = sns.load_dataset('iris')

hv_data = hv.Dataset(iris['petal_width'])

histogram(hv_data, cumulative=True)


But I like using library hvplot, which is built on top of Holoviews, even more:

import hvplot
import hvplot.pandas

iris['petal_width'].hvplot.hist(cumulative=True)

hvplot cumulative histogram

Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
  • Thank you so much. I didn't know about `cumulative=True`. This certainly answers the questions the way I asked it, and more. I was kind of hoping for a cumulative curve overlaying my primary histogram though. Is there a straightforward way to do that? I suppose I could turn your solution into a scatter or curve and then overlay it on the histogram, unless there's a better way. – Nirjhor Chakraborty Sep 18 '19 at 16:01
  • @NirjhorChakraborty You could do something like: iris['petal_width'].hvplot.hist() * iris['petal_width'].hvplot.hist(cumulative=True).to.scatter() Can you accept my answer as the solution to your question? – Sander van den Oord Sep 18 '19 at 20:51
  • Yes. that would work. Thank you!. Also, I had voted up your answer without realizing I could "accept" it. Sorry about that. Accepted it correctly now. Thank you! – Nirjhor Chakraborty Sep 20 '19 at 21:38