0

I have a data frame like this:

id    info
1     0
1     0
2     0
2     10
3     20
3     20

I want to remove the rows for all "id"s that have no change in their "info", that is, remove all rows where the "info" is identical for a certain "id".

For the example above, I would end up:

id    info
2     0
2     10
Sotos
  • 51,121
  • 6
  • 32
  • 66
Zac R.
  • 538
  • 3
  • 17

3 Answers3

3

A base R solution,

df[!with(df, ave(info, id, FUN = function(i)var(i) == 0)),]
#slightly different syntax (as per @lmo)
#df[ave(df$info, df$id, FUN=var) > 0,]

which gives,

  id info
3  2    0
4  2   10
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    You beat me. I had `dat[ave(dat$info, dat$id, FUN=var) > 0,]`. – lmo Mar 21 '18 at 14:10
  • 1
    @lmo I like that syntax better. I m adding it :) – Sotos Mar 21 '18 at 14:12
  • For `df <- data.frame(id = c(1,1,2,2,2,3,3), info=c(0,0,0,10,10,20,20))` there are two rows `2 10` in the result; imho this is not the desired result. – jogo Mar 25 '18 at 11:47
2

Here is a solution with data.table:

library("data.table")
DT <- fread(
"id    info
1     0
1     0
2     0
2     10
3     20
3     20")
DT[, .N, .(id, info)][N==1, .(id, info)]
# > DT[, .N, .(id, info)][N==1, .(id, info)]
#    id info
# 1:  2    0
# 2:  2   10

a variant:

DT[, if (.N==1) TRUE, .(id, info)][, .(id, info)]

Here is a solution using an anti-join:

DT[!DT[duplicated(DT)], on=names(DT)]
jogo
  • 12,469
  • 11
  • 37
  • 42
-1

Another data.table solution using .SD magic variable.

df <- data.table(id = c(1,1,2,2,3,3), info=c(0,0,0,10,20,20))

df[,.SD[uniqueN(.SD)>1],id]

    id info
1:  2    0
2:  2   10
YOLO
  • 20,181
  • 5
  • 20
  • 40