7

In SQL, you can easily avoid multiple OR conditions if you're looking for many values of a particular variable (column) by using IN. For example :

SELECT * FROM colors WHERE color in ('Red', 'Blue', 'Green')

How would I do that in R? I am currently having to do it like this:

shortlisted_colors <- subset(colors, color == 'Red' | color == 'Blue' | color == 'Green')

What is a better way?

dreftymac
  • 31,404
  • 26
  • 119
  • 182
user3422637
  • 3,967
  • 17
  • 49
  • 72

2 Answers2

11
shortlisted_colors <- subset(colors, color %in% c('Red', 'Blue', 'Green'))
nrussell
  • 18,382
  • 4
  • 47
  • 60
  • When I do a summary(shortlisted_colors), It still shows me observations for other colors but shows their counts as 0. – user3422637 Jul 17 '14 at 20:11
  • How do I remove the rows that have the other colors? – user3422637 Jul 17 '14 at 20:14
  • It sounds like `color` is a `factor` variable and not a `character` variable. If `colors` is your `data.frame`, you can do `colors$color <- as.character(colors$color)` and this should clear up the issue. – nrussell Jul 17 '14 at 20:18
  • Thanks. It worked. What is as.character doing here? Any good documentation on this will help too. – user3422637 Jul 17 '14 at 20:27
  • Sure no problem, it just changed the class of your column `color` from `factor` to `character`. Type `?as.character` and `?as.factor` in your console and read the help file; they are two different object classes in R. Although they might look the same on the surface, they have different properties. – nrussell Jul 17 '14 at 20:32
2

I suppose it might be difficult to search on "in" but the answer is "%in%". Searching also might be difficult because in is a reserved word in R because of its use in the iterator specification in for-loops:

subset(colors, color %in% c('Red' ,'Blue','Green') )

See:

?match
?'%in%'   # since you need to quote names with special symbols in them

The use of "%"-signs to enclose user-defined infix function names is illustrated on that page, but you will then get a leg up on understanding how @hadley has raised that approach to a much higher level in his dplyr-package. If you have a solid background in SQL then looping back to see what dplyr offers should be very satisfying. I understand that dplyr-functions are really a front-end to SQL operations in many instances.

IRTFM
  • 258,963
  • 21
  • 364
  • 487