4

I am using both geom_hist and histogram in R with the same breakpoints but I get different graphs. I did a quick search, does anyone know what the definition breaks are and why they would be a difference

These produce two different plots.

set.seed(25)
data <- data.frame(Mos=rnorm(500, mean = 25, sd = 8))
data$Mos<-round(data$Mos)    

pAge <- ggplot(data, aes(x=Mos))
pAge + geom_histogram(breaks=seq(0, 50, by = 2))

enter image description here

hist(data$Mos,breaks=seq(0, 50, by = 2))

enter image description here Thanks

mpalanco
  • 12,960
  • 2
  • 59
  • 67
megv
  • 1,421
  • 5
  • 24
  • 36

1 Answers1

7

To get the same histogram in ggplot2 you specify the breaks inside scale_x_continuous and binwidth inside geom_histogram.

Additionally, hist and histograms in ggplot2 use different defaults to create the intervals:

hist: right-closed (left open) intervals. Default: right = TRUE

stat_bin (ggplot2): left-closed (right open) intervals. Default: right = FALSE

        **hist**    **ggplot2**
         freq1 Freq   freq2 Freq
    1    (0,2]    0   [0,2)    0
    2    (2,4]    2   [2,4)    2
    3    (4,6]    2   [4,6)    1
    4    (6,8]    1   [6,8)    2
    5   (8,10]    6  [8,10)    2
    6  (10,12]    9 [10,12)    7
    7  (12,14]   24 [12,14)   17
    8  (14,16]   27 [14,16)   26
    9  (16,18]   39 [16,18)   31
    10 (18,20]   48 [18,20)   46
    11 (20,22]   52 [20,22)   43
    12 (22,24]   38 [22,24)   57
    13 (24,26]   44 [24,26)   36
    14 (26,28]   46 [26,28)   52
    15 (28,30]   39 [28,30)   39
    16 (30,32]   31 [30,32)   33
    17 (32,34]   30 [32,34)   26
    18 (34,36]   24 [34,36)   29
    19 (36,38]   18 [36,38)   27
    20 (38,40]    9 [38,40)   12
    21 (40,42]    5 [40,42)    6
    22 (42,44]    4 [42,44)    0
    23 (44,46]    1 [44,46)    5
    24 (46,48]    1 [46,48)    0
    25 (48,50]    0 [48,50)    1

I included the argument right = FALSE so the histogram intervalss are left-closed (right open) as they are in ggplot2. I added the labels in both plots, so it is easier to check the intervals are the same.

ggplot(data, aes(x = Mos))+
  geom_histogram(binwidth = 2, colour = "black", fill = "white")+
  scale_x_continuous(breaks = seq(0, 50, by = 2))+
  stat_bin(binwidth = 2, aes(label=..count..), vjust=-0.5, geom = "text")

enter image description here

hist(data$Mos,breaks=seq(0, 50, by = 2), labels =TRUE, right =FALSE)

enter image description here

To check the frequencies in each bin:

freq <- cut(data$Mos, breaks = seq(0, 50, by = 2), dig.lab = 4, right = FALSE) 
as.data.frame(table(frecuencias))
mpalanco
  • 12,960
  • 2
  • 59
  • 67
  • Hi @megv if this or any answer has solved your question please consider [accepting it](http://meta.stackexchange.com/q/5234/179419) by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this. – mpalanco Aug 25 '15 at 10:02