4

When using hist() in R and setting freq=FALSE I should get a densities. However, I do not. I get other numbers than when it just shows the count. I still need to normalize.

For example:

> h = hist(c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5), freq=FALSE)
> h$density
  0.13636364 0.15909091 0.09090909 0.09090909 0.02272727
> sum(h$density)
  [1] 0.5
> h$density/sum(h$density)
  [1] 0.27272727 0.31818182 0.18181818 0.18181818 0.0454545
alain.janinm
  • 19,951
  • 10
  • 65
  • 112
eran
  • 14,496
  • 34
  • 98
  • 144

4 Answers4

7

If you examine the rest of the histogram output, you will notice that the bars have length 2:

$breaks
[1]  0  2  4  6  8 10

Hence you should multiple the sum(h$density) by 2 to get the area equal to one. You can see this clearly if you look at the histogram.

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295
csgillespie
  • 59,189
  • 14
  • 150
  • 185
1

The density is not the same as the probability. The density for a histogram is the height of the bar. The probability is the area of the bar. You need to multiply the height times with width to get the area. Try

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
hh <- hist(x, probability = TRUE)
sum(diff(hh$breaks) * hh$density)
# [1] 1

The works because breaks contains the start/end points for each of the bins. So by taking the difference between each value, you get the total width of the bin. You can also with() to more easily grab both of those values.

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
with(hist(x, probability = TRUE), sum(diff(breaks) * density))
# [1] 1
MrFlick
  • 195,160
  • 17
  • 277
  • 295
1

The area of the histogram is, in fact, 1.0. What you're not taking into account is that every bar is two units wide:

> h$breaks
[1]  0  2  4  6  8 10
NPE
  • 486,780
  • 108
  • 951
  • 1,012
1
sum(h$density*(h$breaks[-1] - h$breaks[-length(h$breaks)]))

[1] 1
Backlin
  • 14,612
  • 2
  • 49
  • 81