Remove rows where all values of a column are identical, based on another column

Question

I have a data frame like this:

I want to remove the rows for all "id"s that have no change in their "info", that is, remove all rows where the "info" is identical for a certain "id".

For the example above, I would end up:

id    info
2     0
2     10

base R: `subset(df, ave(info, id, FUN = function(x) length(unique(x))) > 1)` — talat, Mar 21 '18 at 14:08
the `dplyr` solution would be `df %>% group_by(id) %>% filter(length(unique(info))>1)` — gfgm, Mar 21 '18 at 14:08

Sotos · Accepted Answer · 2018-03-21T14:13:02.470

3

A base R solution,

df[!with(df, ave(info, id, FUN = function(i)var(i) == 0)),]
#slightly different syntax (as per @lmo)
#df[ave(df$info, df$id, FUN=var) > 0,]

which gives,

  id info
3  2    0
4  2   10

edited Mar 21 '18 at 14:13

answered Mar 21 '18 at 14:08

Sotos

51,121
6
32
66

1

You beat me. I had `dat[ave(dat$info, dat$id, FUN=var) > 0,]`. – lmo Mar 21 '18 at 14:10
1

@lmo I like that syntax better. I m adding it :) – Sotos Mar 21 '18 at 14:12
For `df <- data.frame(id = c(1,1,2,2,2,3,3), info=c(0,0,0,10,10,20,20))` there are two rows `2 10` in the result; imho this is not the desired result. – jogo Mar 25 '18 at 11:47

jogo · Answer 2 · 2018-04-06T14:17:25.127

2

Here is a solution with data.table:

library("data.table")
DT <- fread(
"id    info
1     0
1     0
2     0
2     10
3     20
3     20")
DT[, .N, .(id, info)][N==1, .(id, info)]
# > DT[, .N, .(id, info)][N==1, .(id, info)]
#    id info
# 1:  2    0
# 2:  2   10

a variant:

DT[, if (.N==1) TRUE, .(id, info)][, .(id, info)]

Here is a solution using an anti-join:

DT[!DT[duplicated(DT)], on=names(DT)]

edited Apr 06 '18 at 14:17

answered Mar 21 '18 at 14:07

jogo

12,469
11
37
42

YOLO · Answer 3 · 2018-03-21T14:22:43.620

-1

Another data.table solution using .SD magic variable.

df <- data.table(id = c(1,1,2,2,3,3), info=c(0,0,0,10,20,20))

df[,.SD[uniqueN(.SD)>1],id]

    id info
1:  2    0
2:  2   10

edited Mar 21 '18 at 14:22

answered Mar 21 '18 at 14:08

YOLO

20,181
5
20
40

Sorry and thanks for bringing it up :) – YOLO Mar 21 '18 at 14:23
For `df <- data.table(id = c(1,1,2,2,2,3,3), info=c(0,0,0,10,10,20,20))` it will produce not the desired result. – jogo Mar 21 '18 at 14:30
Did you try running this one? It does give the output as expected. – YOLO Mar 21 '18 at 14:36
Yes I tried it. I got the row `2 10` twice. – jogo Mar 21 '18 at 14:40

Remove rows where all values of a column are identical, based on another column

3 Answers3

Linked

Related