Why does the hist() function not have area one

Question

When using hist() in R and setting freq=FALSE I should get a densities. However, I do not. I get other numbers than when it just shows the count. I still need to normalize.

For example:

> h = hist(c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5), freq=FALSE)
> h$density
  0.13636364 0.15909091 0.09090909 0.09090909 0.02272727
> sum(h$density)
  [1] 0.5
> h$density/sum(h$density)
  [1] 0.27272727 0.31818182 0.18181818 0.18181818 0.0454545

score 7 · Accepted Answer · edited Jul 01 '21 at 07:24

7

If you examine the rest of the histogram output, you will notice that the bars have length 2:

$breaks
[1]  0  2  4  6  8 10

Hence you should multiple the sum(h$density) by 2 to get the area equal to one. You can see this clearly if you look at the histogram.

edited Jul 01 '21 at 07:24

MrFlick

195,160
17
277
295

answered Oct 18 '11 at 14:05

csgillespie

59,189
14
150
185

1

And to get the widths use `diff(h$breaks)` – James Oct 18 '11 at 14:41

MrFlick · Answer 2 · 2021-07-01T07:22:09.287

The density is not the same as the probability. The density for a histogram is the height of the bar. The probability is the area of the bar. You need to multiply the height times with width to get the area. Try

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
hh <- hist(x, probability = TRUE)
sum(diff(hh$breaks) * hh$density)
# [1] 1

The works because breaks contains the start/end points for each of the bins. So by taking the difference between each value, you get the total width of the bin. You can also with() to more easily grab both of those values.

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
with(hist(x, probability = TRUE), sum(diff(breaks) * density))
# [1] 1

score 1 · Answer 3 · answered Oct 18 '11 at 14:06

1

The area of the histogram is, in fact, 1.0. What you're not taking into account is that every bar is two units wide:

> h$breaks
[1]  0  2  4  6  8 10

answered Oct 18 '11 at 14:06

NPE

486,780
108
951
1,012

score 1 · Answer 4 · answered Oct 18 '11 at 14:06

1

sum(h$density*(h$breaks[-1] - h$breaks[-length(h$breaks)]))

[1] 1

answered Oct 18 '11 at 14:06

Backlin

14,612
2
49
81

3

Or just use `sum(h$density*diff(h$breaks))` – James Oct 18 '11 at 14:41
Thanks! You learn something every day. Great to know cause I do this quite often. – Backlin Oct 19 '11 at 09:47

Why does the hist() function not have area one

4 Answers4

Linked

Related