11

Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, even within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.

Thanks in advance.

Edit:

Thanks for all the great tools! I'll definitely check them out.

This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.

Jon Onstott
  • 13,499
  • 16
  • 80
  • 133
  • The solutions are completely different for different languages. Tagging for ASP.NET/C#. – Tronic Feb 02 '10 at 19:32
  • Correct, but I would like to know the best tools for the most popular languages (because I work with several languages at a time). Thanks though. – Jon Onstott Feb 02 '10 at 20:31
  • The CloneDR solution is the *same* for many languages. CloneDR handles C#, Java, HTML and JavaScript, which I think covers the OP's "ASP.NET" situation pretty well. – Ira Baxter Feb 12 '10 at 01:03
  • Other keywords to look for: copy-paste detection, similarity recognition. You can give [tag:pmd] a try – Martin Thoma Sep 08 '20 at 14:29

9 Answers9

4

Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.

http://www.getatomiq.com/

Chris Missal
  • 5,987
  • 3
  • 28
  • 46
  • CopyPasteKiller has been rebranded as Atomiq and is now $30 (which seems reasonable). http://nimblepros.com/products/atomiq.aspx – Peter Bernier Jun 17 '10 at 18:16
2

If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.

Jeff Storey
  • 56,312
  • 72
  • 233
  • 406
1

See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.

The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
1

You don't say what language you are using, which is going to affect what tools you can use.

For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.

Dave Kirby
  • 25,806
  • 5
  • 67
  • 84
0

Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you

BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
0

Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.

Ravi Gupta
  • 4,468
  • 12
  • 54
  • 85
0

One with some Office skills can do following sequence in 1 minute:

  • use ordinary formatter to unify the code style, preferably without line wrapping
  • feed the code text into Microsoft Excel as a single column
  • search and replace all dual spaces with single one and do other replacements
  • sort column

At this point the keywords for duplicates will be already well detected. But to go further

  • add comparator formula to 2nd column and counter to 3rd
  • copy and paste values again, sort and see the most repetitive lines
0

There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.

Grant Palin
  • 4,546
  • 3
  • 36
  • 55
0

Another option similar to those above, but with a different tool chain: https://www.npmjs.com/package/jscpd

bsb
  • 1,847
  • 26
  • 24