Deduplication in R Studio

Question

this is my first R Code, and it is a very simple deduplication, but it is working so slowly I can't believe it! My question is: Is it normal that it is working so slowly or is my code just bad? Here it is:

file1=c(read.delim("file.txt", header=TRUE))   

dedupes<-0
i<-1
n<-1
while (i<=100) {

  while (n<=100) {

    if (file1$email[i]==file1$email[n] && i!=n) { 

    #Remember amount of deduces
      dedupes=dedupes+1
    #Show dedupes 
      print(file1$email[i])             }   

    n<-n+1

  } 

  n<-1
  i<-i+1 

}

#Show amount of dedupes
cat("There are ", dedupes/2, " deduces")

Many thanks in advance, Saitam

I think it's better to ask such question at [code review](http://codereview.stackexchange.com/) — Kiril, Feb 24 '15 at 19:09
Wouldn't it be simpler just to do: `cat( sum( duplicated(file1$email) ) )`? — IRTFM, Feb 24 '15 at 19:31
Nice, thank you! I didn't know about that command duplicated() Is there also a possibility to show the name of the duplicates instead of a valse/true value? — sunwarr10r, Feb 25 '15 at 12:59

score 0 · Answer 1 · answered Feb 24 '15 at 23:15

0

Imbricated loops are well known to be slow in R. You need to vectorize your calculus or use existing optimized functions such as in the suggestion of BondedDust

answered Feb 24 '15 at 23:15

cmbarbu

4,354
25
45

Thanks for answering, is there also a way to deduplicate without giving attention on small and big letters? – sunwarr10r Feb 25 '15 at 17:25

Deduplication in R Studio

1 Answers1