4

I have two problems with the histogram I'm making: I'm trying to generate an histogram in R, using the function hist(), for a frequency distribution of a numeric (interval) variable, which ranges from 0 to 10. The plot I'm aiming for should generate 11 bars. However:

Problem 1: My good friend R keeps plotting 10 bars. Problem 2: Besides, I have no idea how to plot the "x" tick values in the proper way: below each bar, the category value in "x" (i.e., from 0 to 10). I have tried using axis(), as you can see below in my code, but I have no idea how to set this.

I'm pretty sure it is something quite basic, but I cannot find the solution to these two problems. To solve the first one, I thought setting breaks=11 would solve it, but it didn't work. I have no idea how to solve the second one.

This is my data (128 participants in a study, each scored a value between 0 and 10):

structure(list(ID_Esc_Def = c(915151L, 91494L, 1303553L, 1310117L, 
1305411L, 1312263L, 1310121L, 1312264L, 1306498L, 1305413L, 1306497L, 
1307521L, 1305409L, 1307523L, 1306491L, 1311366L, 1307532L, 915155L, 
917111L, 915152L, 917120L, 1312267L, 1310119L, 1310114L, 1305414L, 
1305412L, 1306494L, 1303559L, 1307526L, 917112L, 91496L, 1305410L, 
1307527L, 917113L, 1307529L, 91491L, 1306495L, 1311363L, 1306493L, 
1312260L, 1306496L, 91493L, 1303561L, 915153L, 1310118L, 1307530L, 
1303557L, 1310113L, 1303556L, 1312259L, 1312258L, 91498L, 1311361L, 
1311367L, 1311365L, 1306499L, 1312262L, 915149L, 915148L, 91492L, 
1310120L, 1308168L, 1312266L, 1308167L, 1306492L, 1307531L, 1305408L, 
1307514L, 1310116L, 1310112L, 1306490L, 1307525L, 1310115L, 1308174L, 
1311368L, 915157L, 1312261L, 1308169L, 1312265L, 91495L, 1311360L, 
1305417L, 91489L, 915154L, 1303555L, 91497L, 917118L, 131389L, 
521390L, 521389L, 915150L, 131386L, 1305415L, 1311362L, 521386L, 
131382L, 91490L, 521391L, 1305416L, 1303562L, 1311369L, 917116L, 
521388L, 917114L, 1303558L, 521396L, 521387L, 1308172L, 131388L, 
521395L, 131390L, 131384L, 917117L, 1311364L, 131387L, 1308173L, 
917119L, 131385L, 917115L, 915156L, 521393L, 1308171L, 1308170L, 
1303560L, 521392L, 131391L, 131383L, 1303554L), asig_dic = c(0L, 
10L, 2L, 4L, 5L, 6L, 5L, 1L, 5L, 10L, 4L, 3L, 4L, 5L, 7L, 6L, 
5L, 4L, 3L, 2L, 4L, 6L, 1L, 3L, 5L, 5L, 3L, 3L, 7L, 0L, 3L, 5L, 
2L, 2L, 4L, 3L, 2L, 5L, 5L, 7L, 5L, 6L, 0L, 5L, 10L, 6L, 3L, 
5L, 5L, 6L, 10L, 5L, 5L, 8L, 10L, 5L, 2L, 5L, 4L, 3L, 1L, 2L, 
5L, 4L, 5L, 7L, 5L, 3L, 5L, 5L, 7L, 7L, 1L, 1L, 2L, 5L, 0L, 3L, 
5L, 3L, 5L, 10L, 4L, 0L, 1L, 5L, 5L, 5L, 5L, 5L, 6L, 10L, 4L, 
6L, 1L, 9L, 4L, 4L, 5L, 5L, 6L, 5L, 1L, 5L, 1L, 2L, 2L, 0L, 5L, 
5L, 6L, 6L, 5L, 3L, 5L, 3L, 3L, 5L, 5L, 6L, 4L, 2L, 2L, 0L, 4L, 
6L, 5L, 5L)), class = "data.frame", row.names = c(NA, -128L))

This is the code I'm using to generate the histogram:

m<-mean(dictator$asig_dic)
std<-sqrt(var(dictator$asig_dic))

quartz()
par(mfrow=c(1,1))
par(las=1)

hist(dictator$asig_dic, breaks = 11, col = "grey", freq = F, ylim=c(0,0.4), xaxt="n",
     xlab="Generosity (Donated Tokens in DG)", ylab="Relative Frequency", main="")

curve(dnorm(x, mean=m, sd=std), 
      col="darkblue", lwd=2, add=TRUE, yaxt="n")

axis(1, at = seq(0, 10, by = 1), las=1)

As you can see, there are 10 bars. Any help?

All the best,

Mauricio.

Mauricio Salgado
  • 51
  • 1
  • 1
  • 4
  • 3
    `hist` gives your value of `breaks` to `pretty`, which only takes it as a suggestion. It uses 10 instead of 11, because that's a nice round number and that's how `pretty` rolls. – Dan Jun 30 '18 at 22:50

2 Answers2

4

One solution is to drink the ggplot2 Kool Aid. As you can see, there are 11 bins (as requested) and the labels are at the centre of the bars.

library(ggplot2)
library(ggthemes)

ggplot(dictator) + 
  geom_histogram(aes(x = asig_dic, y = (..count..)/sum(..count..)), 
                 bins = 11, fill = "gray", colour = "black") +
  stat_function(fun = dnorm, 
                args = list(mean = mean(dictator$asig_dic), 
                            sd = sd(dictator$asig_dic)),
                colour = "salmon", size = 1.25) +
  theme_tufte() +
  theme(axis.line = element_line(size = 1),
        axis.text = element_text(size = 18),
        axis.title = element_text(size = 20)) +
  xlab("Generosity (Donated Tokens in DG)") +
  ylab("Relative Frequency") +
  scale_x_continuous(breaks = 0:10)

enter image description here

Dan
  • 11,370
  • 4
  • 43
  • 68
4

As @Lyngbakr said, you don't get 11 bins because the breaks argument in hist() uses pretty values. The problem with pretty values is they are chosen so they are 1,2, or 5 times a power of 10. Therefore, you get only 10 bins and not 11. One way to "fight" it is to use more bins than necessary (e.g breaks = seq(-1, 10, 1) or breaks = seq(0, 10, 0.5)). However, it might "ruin" your graph. You can look here for help on pretty values, and here for a more in-depth explanation.

If you would still like to use base R (and not other packages like ggplot2), you can do this:

par(mfrow=c(1,1))
par(las=1)
hist(dictator$asig_dic,
     breaks = seq(-1, 10, 1),
     col = "grey", freq = F, ylim=c(0,0.4), xaxt="n",
     xlab="Generosity (Donated Tokens in DG)", ylab="Relative Frequency", main="")
curve(dnorm(x, mean=m, sd=std), 
      col="darkblue", lwd=2, add=TRUE, yaxt="n")
axis(1, at = seq(-1, 11, by = 1), las=1)
DJV
  • 4,743
  • 3
  • 19
  • 34