7

Suppose I need to plot a dataset like below:

set.seed(1)
dataset <- sample(1:7, 1000, replace=T)
hist(dataset)

As you can see in the plot below, the two leftmost bins do not have any space between them unlike the rest of the bins.

enter image description here

I tried changing xlim, but it didn't work. Basically I would like to have each number (1 to 7) represented as a bin, and additionally, I would like any two adjacent bins to have space beween them...Thanks!

Alex
  • 4,030
  • 8
  • 40
  • 62

2 Answers2

10

The best way is to set the breaks argument manually. Using the data from your code,

hist(dataset,breaks=rep(1:7,each=2)+c(-.4,.4))

gives the following plot:

enter image description here

The first part, rep(1:7,each=2), is what numbers you want the bars centered around. The second part controls how wide the bars are; if you change it to c(-.49,.49) they'll almost touch, if you change it to c(-.3,.3) you get narrower bars. If you set it to c(-.5,.5) then R yells at you because you aren't allowed to have the same number in your breaks vector twice.

Why does this work?

If you split up the breaks vector, you get one part that looks like this:

> rep(1:7,each=2)
 [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7

and a second part that looks like this:

> c(-.4,.4)
 [1] -0.4  0.4

When you add them together, R loops through the second vector as many times as needed to make it as long as the first vector. So you end up with

  1-0.4  1+0.4  2-0.4  2+0.4  3-0.4  3+0.4 [etc.]
=   0.6    1.4    1.6    2.4    2.6    3.4 [etc.]

Thus, you have one bar from 0.6 to 1.4--centered around 1, with width 2*.4--another bar from 1.6 to 2.4 centered around 2 with with 2*.4, and so on. If you had data in between (e.g. 2.5) then the histogram would look kind of silly, because it would create a bar from 2.4 to 2.6, and the bar widths would not be even (since that bar would only be .2 wide, while all the others are .8). But with only integer values that's not a problem.

Jonathan Christensen
  • 3,756
  • 1
  • 19
  • 16
  • Could you please elaborate a bit more on `rep(1:7, each=2)`? What's the logic behind using essentially 1, 1, 2, 2, 3, 3, ...,7,7 to tell R that I want to center the bars around 1,2,..,7? Thanks! – Alex Jan 18 '13 at 05:18
  • 1
    Oh, well, I went and did it anyway. Maybe it will help someone else. – Jonathan Christensen Jan 18 '13 at 05:25
  • One more question: note that in the plot created by your code the y-axis now shows Density instead of Frequency. Is there anyway to retain Frequency? – Alex Jan 18 '13 at 05:35
  • You can: `hist(dataset,breaks=rep(1:7,each=2)+c(-.4,.4),freq=TRUE)`. It gives a warning about the "areas being wrong" because of the bars being different widths (like the hypothetical 2.4-2.6 bar that I mentioned), but the plot is correct. – Jonathan Christensen Jan 18 '13 at 05:48
-3

You need six bars NOT seven bars; that is what your histogram has space for. But then you end up generating seven bars. That is the bug.

do sample(1:6, 1000, replace=T) instead of sample(1:7, 1000, replace=T)

If you do need seven bars, then seed with 0

Amit
  • 1,836
  • 15
  • 24