9

I have a dataset that I'd like to plot with hist in R. There are a number of rows in the dataset whose values are beyond a value that I care about. Specifically, my R script is:

library(ggplot2)    
data = read.table("input.txt", sep=" ", strip.white=TRUE, header=TRUE)
pdf("out.pdf")
hist(data$actions,breaks=seq(0,130,by=1))
dev.off()

An example dataset for input.txt is:

name actions
foo 3
bar 129
baz 131

If I run the R script, I get an error:

Error in hist.default(data$actions, breaks = seq(0, 130, by = 1), :
some 'x' not counted; maybe 'breaks' do not span range of 'x'
Calls: hist -> hist.default
Execution halted

I know why this error occurs: there is one occurrence of a value greater than 130, namely baz with a value of 131.

What I'd like is to create a histogram just for the frequencies in the specified range of 0 to 130, and for all frequencies out of that range to be silently ignored. How can I do this?

lmo
  • 37,904
  • 9
  • 56
  • 69
Rob Stewart
  • 1,812
  • 1
  • 12
  • 25
  • drop those observations: `with(data, hist(actions[actions >= 0 & actions < 131], breaks=seq(0,130,by=1))`. Also, hist is a base R graphic, so `library(ggplot2)` is unecessary. – lmo Nov 23 '16 at 13:23
  • Wonderful, thank you! If you type this out as an answer, then I'll happily accept it as the chosen answer. – Rob Stewart Nov 23 '16 at 13:27

1 Answers1

3

The best way to avoid this error is to subset the data that you feed to the base R function hist.

For example,

with(data, hist(actions[actions >= 0 & actions < 131], breaks=seq(0,130,by=1))

Maybe a little more flexible approach is to pre-specify the desired set of values, to make it easier to adjust if you change your mind at some point.

myValues <- seq_len(131)-1
with(data, hist(actions[actions %in% myValues], breaks=myValues)
lmo
  • 37,904
  • 9
  • 56
  • 69