-1

this is my first R Code, and it is a very simple deduplication, but it is working so slowly I can't believe it! My question is: Is it normal that it is working so slowly or is my code just bad? Here it is:

file1=c(read.delim("file.txt", header=TRUE))   

dedupes<-0
i<-1
n<-1
while (i<=100) {

  while (n<=100) {

    if (file1$email[i]==file1$email[n] && i!=n) { 

    #Remember amount of deduces
      dedupes=dedupes+1
    #Show dedupes 
      print(file1$email[i])             }   

    n<-n+1

  } 

  n<-1
  i<-i+1 

}

#Show amount of dedupes
cat("There are ", dedupes/2, " deduces")

Many thanks in advance, Saitam

sunwarr10r
  • 4,420
  • 8
  • 54
  • 109
  • 1
    I think it's better to ask such question at [code review](http://codereview.stackexchange.com/) – Kiril Feb 24 '15 at 19:09
  • 1
    Wouldn't it be simpler just to do: `cat( sum( duplicated(file1$email) ) )`? – IRTFM Feb 24 '15 at 19:31
  • Nice, thank you! I didn't know about that command duplicated() Is there also a possibility to show the name of the duplicates instead of a valse/true value? – sunwarr10r Feb 25 '15 at 12:59

1 Answers1

0

Imbricated loops are well known to be slow in R. You need to vectorize your calculus or use existing optimized functions such as in the suggestion of BondedDust

cmbarbu
  • 4,354
  • 25
  • 45
  • Thanks for answering, is there also a way to deduplicate without giving attention on small and big letters? – sunwarr10r Feb 25 '15 at 17:25