1

I'm having trouble creating a histogram using ggplot.

I have a data structure as follows:

value_1
112.45
2457.44
333.24

And this list of values continues for about 25000 more observations.

I want a histogram that has bins of the frequency of values 0-100 then 100-200 then 200-300 all up to the upper limit of values.

In the example above that would give 1 count in the bin 100-200, 1 count in the bin 300-400 and one count in the bin 2400-2500.

Could you help me in the right direction?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Cheeseburgler
  • 31
  • 1
  • 4
  • Related post: http://stackoverflow.com/questions/21031060/setting-breaks-in-ggplot2-histogram – zx8754 Mar 06 '17 at 13:33
  • @mt1022 then maybe this is right duplicate http://stackoverflow.com/questions/15231109/stacked-histogram-from-already-summarized-counts-using-ggplot2 – zx8754 Mar 06 '17 at 13:43
  • @zx8754. I am afraid not. for this question, what the OP wants seems to be setting either `center` or `boundary` with `binwidth` at the same time in `geom_histogram`. – mt1022 Mar 06 '17 at 13:56
  • @mt1022 feel free to answer, my bad, just trying to find a good dupe. From the title it sounded the same. – zx8754 Mar 06 '17 at 13:59

1 Answers1

3

you can set the right bin width by setting the binwidth and either center or boundary at the same time:

df <- data.frame(x = c(112.45, 2457.44, 333.24))

library(ggplot2)  # 2.2.1
ggplot(df, aes(x)) + geom_histogram(binwidth = 100, center = 150)
# or
ggplot(df, aes(x)) + geom_histogram(binwidth = 100, boundary = 100)

center

The center of one of the bins. Note that if center is above or below the range of the data, things will be shifted by an appropriate number of widths. To center on integers, for example, use width = 1 and center = 0, even if 0 is outside the range of the data. At most one of center and boundary may be specified.

boundary

A boundary between two bins. As with center, things are shifted when boundary is outside the range of the data. For example, to center on integers, use width = 1 and boundary = 0.5, even if 0.5 is outside the range of the data. At most one of center and boundary may be specified.

If you known the range of the data, you can also set this manually with breaks = in geom_histogram only.

mt1022
  • 16,834
  • 5
  • 48
  • 71