1

I can't get Julia to display edge values on histograms, when defining a range for the bins. Here is a minimal example:

using Plots
x = [0,0.5,1]
plot(histogram(x, bins=range(0,1,length=3)))

Defining them explicitly doesn't help (bins=[0,0.3,0.7,1]). It seems that histogram() excludes the limits of the range. I can extend the range to make it work:

plot(histogram(x, bins=[0,0.3,0.7,1.01))

But I really don't think that should be the way to go. Surprisingly, fixing the number of bins does work (nbins=3) but I need to keep the width of all the bins the same and constant across different runs for comparison purposes.

I have tried with Plots, PlotlyJS and StatsBase (with fit() and its closed attribute) to no avail. Maybe I'm missing something, so I wanted to ask: is it possible to do what I want?

capstain
  • 85
  • 1
  • 9

2 Answers2

1

Try:

plot(histogram(x, bins=range(0,nextfloat(1.0),length=3)))

Although this extends the range, it does so in a minimal way. Essentially the most minimal which turns the right end of the histogram closed.

As for equal widths, when dealing with floating points, equal widths has different meanings - in terms of real numbers (which are not always representible), or in terms (for example) of the number of values, but this can be different for [0.0,1.0] and [1.0,2.0].

So hopefully, this scratches the itch in the OP.

Dan Getz
  • 17,002
  • 2
  • 23
  • 41
  • Ideally there'd be some cool function or attribute for this feature, but this solution is neat enough for me :) – capstain Feb 19 '23 at 01:40
-1

https://juliastats.org/StatsBase.jl/latest/empirical/#StatsBase.Histogram

most importantly:

closed: A symbol with value :right or :left indicating on which side bins (half-open intervals or higher-dimensional analogues thereof) are closed. See below for an example.

this is very common in many histogram implementations, for example, Numpy

In [8]:  np.histogram([0], bins=[0, 1, 2])
Out[8]: (array([1, 0]), array([0, 1, 2]))

In [9]:  np.histogram([1], bins=[0, 1, 2])
Out[9]: (array([0, 1]), array([0, 1, 2]))

Numpy has the inconsistency that the last bin is closed on both sides, but it's perfectly normal for every bin to close on one side,

jling
  • 2,160
  • 12
  • 20
  • This doesn't seem to do the job completely. If `closed` is assigned to `left`, then the lowest value of the histogram (in my example, `0`) is picked up by the histogram, but it doesn't show the highest value (`1`). Conversely, if `right` is set, the `1` is picked up but not the `0`. – capstain Feb 19 '23 at 00:53
  • what job? I'm not saying there's a fix, I'm saying this isn't a problem to begin with, each bin has to close on one side and one side only, it's the only consistent behavior. – jling Feb 19 '23 at 01:09