19

Here is an example that was taken from a fellow SO member.

# define a %not% to be the opposite of %in%
library(dplyr)
# data
f <- c("a","a","a","b","b","c")
s <- c("fall","spring","other", "fall", "other", "other")
v <- c(3,5,1,4,5,2)
(dat0 <- data.frame(f, s, v))
#  f      s v
#1 a   fall 3
#2 a spring 5
#3 a  other 1
#4 b   fall 4
#5 b  other 5
#6 c  other 2
(sp.tmp <- filter(dat0, s == "spring"))
#  f      s v
#1 a spring 5
(str(sp.tmp))
#'data.frame':  1 obs. of  3 variables:
# $ f: Factor w/ 3 levels "a","b","c": 1
# $ s: Factor w/ 3 levels "fall","other",..: 3
# $ v: num 5

The df resulting from filter() has retained all the levels from the original df.

What would be the recommended way to drop the unused level(s), i.e. "fall" and "others", within the dplyr framework?

Community
  • 1
  • 1
ils
  • 473
  • 2
  • 4
  • 9
  • I have been using spreadsheets quite a lot for data pre-processing, but since I discovered `dplyr` that seems to have changed ;-) However, when one applies filters in a spreadsheet, the "hidden" range seems to be nonexistent for copy/paste operations. That's why I was surprised finding the filtered content partially transferred to the new df after applying `filter()`. Therefore I asked how to get the same effect *within* the `dplyr` framework, expecting that there might be an argument for that. – ils Nov 09 '14 at 10:01
  • Would it be OK to entirely delete this question now? – ils Nov 09 '14 at 10:04
  • If it will declutter the environment I'll do so gladly. Hope that both helpers won't mind the downvote... – ils Nov 09 '14 at 10:10
  • I think they will... – David Arenburg Nov 09 '14 at 10:11
  • It seems that I can't downvote until the answers are edited :-/ – ils Nov 09 '14 at 10:13
  • 2
    Just leave it as is. The answers show some additional implementation on `dplyr` – David Arenburg Nov 09 '14 at 10:14
  • My understanding is that duplicate questions should be _closed_, not necessarily deleted because they might help others find the original question and answers in the future. – talat Nov 09 '14 at 10:33

2 Answers2

54

You could do something like:

dat1 <- dat0 %>%
  filter(s == "spring") %>% 
  droplevels()

Then

str(df)
#'data.frame':  1 obs. of  3 variables:
# $ f: Factor w/ 1 level "a": 1
# $ s: Factor w/ 1 level "spring": 1
# $ v: num 5
talat
  • 68,970
  • 21
  • 126
  • 157
4

You could use droplevels

 sp.tmp <- droplevels(sp.tmp)
 str(sp.tmp)
 #'data.frame': 1 obs. of  3 variables:
 #$ f: Factor w/ 1 level "a": 1
 #$ s: Factor w/ 1 level "spring": 1
# $ v: num 5
akrun
  • 874,273
  • 37
  • 540
  • 662