Remove duplicated 2 columns permutations

Question

I can't find a good title for this question so feel free to edit it please.

I have this data.frame

  section time to from
1       a    9  1    2
2       a    9  2    1
3       a   12  2    3
4       a   12  2    4
5       a   12  3    2
6       a   12  3    4
7       a   12  4    2
8       a   12  4    3

I want to remove duplicated rows that have the same to and from simultaneously, without computing permutations of the 2 columns: e.g (1,2) and (2,1) are duplicated.

So final output would be:

  section time to from
1       a    9  1    2
3       a   12  2    3
4       a   12  2    4
6       a   12  3    4

I have a solution by constructing a new column key e.g

  key <- paste(min(to,from),max(to,from))

and remove duplicated key using duplicated, but I think this is dirty solution.

here the dput of my data

structure(list(section = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = "a", class = "factor"), time = c(9L, 9L, 12L, 
12L, 12L, 12L, 12L, 12L), to = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 
4L), from = c(2L, 1L, 3L, 4L, 2L, 4L, 2L, 3L)), .Names = c("section", 
"time", "to", "from"), row.names = c(NA, -8L), class = "data.frame")

Just curious: how big is your actual dataset? – A5C1D2H2I1M1N2O1R2T1 Dec 29 '12 at 08:34 — A5C1D2H2I1M1N2O1R2T1, Dec 29 '12 at 08:34
@AnandaMahto my dataset is not so big (10000 lines). – agstudy Dec 29 '12 at 09:50 — agstudy, Dec 29 '12 at 09:50

score 4 · Accepted Answer · edited May 23 '17 at 10:29

4

mn <- pmin(s$to, s$from)
mx <- pmax(s$to, s$from)
int <- as.numeric(interaction(mn, mx))
s[match(unique(int), int),]
  section time to from
1       a    9  1    2
3       a   12  2    3
4       a   12  2    4
6       a   12  3    4

Credit for the idea goes to this question: Remove consecutive duplicates from dataframe and specifically @MatthewPlourde's answer.

edited May 23 '17 at 10:29

Community

1
1

answered Dec 29 '12 at 04:04

Matthew Lundberg

42,009
6
90
112

+1 thanks! because you construct the key with a clean method, better than paste. – agstudy Dec 29 '12 at 04:11

score 4 · Answer 2 · answered Dec 29 '12 at 04:12

4

You can try using sort within the apply function to order the combinations.

mydf[!duplicated(t(apply(mydf[3:4], 1, sort))), ]
#   section time to from
# 1       a    9  1    2
# 3       a   12  2    3
# 4       a   12  2    4
# 6       a   12  3    4

answered Dec 29 '12 at 04:12

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

thanks! That's kind of solution I am looking for!can you explain please why you transpose? – agstudy Dec 29 '12 at 04:14
Mine is 2.5x faster on the (small) example (and not using variables for mn, mx). – Matthew Lundberg Dec 29 '12 at 04:19
@agstudy, try `t(apply(mydf[3:4], 1, sort))` and compare it to `apply(mydf[3:4], 1, sort)` to see why I transposed the output of `apply`. – A5C1D2H2I1M1N2O1R2T1 Dec 29 '12 at 04:22
@MatthewLundberg !Thanks! you're right! I test it and I think the sort is a time consuming here! – agstudy Dec 29 '12 at 04:22
@agstudy replace `sort` with `I`. Interesting performance results (but wrong output, of course). – Matthew Lundberg Dec 29 '12 at 04:37
@MatthewLundberg, you can get some performance gains with my approach using `range` instead of `sort`. For example `x <- apply(mydf[3:4], 1, range); mydf[!duplicated(x, MARGIN = 2), ]` comes *close*. Not sure about how it scales though ;) – A5C1D2H2I1M1N2O1R2T1 Dec 29 '12 at 04:58
@AnandaMahto can you edit your question with this please. I have only 2 columns to compare so i will give it a try. I like your solution elegant , I hope it will be faster. – agstudy Dec 29 '12 at 05:04
@agstudy, it won't be faster than Matthew's approach. Chances are that Matthew's will scale *much* better because of the underlying factors that get created with `interaction`. – A5C1D2H2I1M1N2O1R2T1 Dec 29 '12 at 05:20
@AnandaMahto Thank's a lot for you interest ! my paste approch is slower than Matthew's one(faster than sort) , so I think you are right when you speak about the underlying factor! I still an R newbie and I have many things to learn! – agstudy Dec 29 '12 at 05:25

Remove duplicated 2 columns permutations

2 Answers2

Linked