Questions tagged [cpd]

CPD: Copy and Paste Detector: a tool for finding where source code has been duplicated/cloned.

CPD: Copy and Paste Detector: a tool for finding where source code has been duplicated/cloned.

There are 3 fundamental types of these tools:

  • Those that match text strings or lines exactly; they have essentially zero knowledge of the actual language being processed. These find exact clones; changes in formatting or additional comments prevent detection of larger matches. They can be fast and scalable, but only find exact copies, and thus don't produce good answers if the cloned code has been edited, which is the common case. Summary: cheap, easy, weak detection ability.
  • Token-based detectors. These detectors know roughly have to break a source code into its constituent atoms ("tokens") such as identifiers, numbers, keywords, operators, comments and whitespace. Knowledge of whitespace and comments allows the detector to match code that has been reformatted. Ignoring the content of identifiers and numbers allows such detectors to match code where names have been changed or different values have been used. But these detectors don't understand language structure, and tend to treat "} {" as clones, in spite of the fact they are uninteresting clones. As a consequence, token based detectors have to match rather long sequences of tokens to avoid producing a lot of false positive matches. Summary: better, requires very long matches to avoid flood of false positives.
  • Structure-based detectors. These know the token and language structure. Like token detectors, reformatting doesn't prevent matches. Unlike token detectors, these tools only identify clones that match language structures, such as expressions, statements, or blocks; they can never propose "} {". So they can find smaller clones reliably. They can also allow gaps between the identical parts that match stuctures, so they can recognize two identical statements separated by third differing statement, as a clone with the third statement as a parameter. This allows detection of sophisticated clones. Summary: slower, but more accurate and more interesting clones detected.

[Thanks to Semantic Designs for this background knowledge].

See http://en.wikipedia.org/wiki/Duplicate_code for more details.

31 questions
1
vote
1 answer

How can I ignore Annotations when using Maven CPD?

I know there is an option to ignoreAnnotations in the CPD CLI reference guide but I can't seem to get this to work using maven pmd:cpd plugin. When I view the mvn pmd page it doesn't list 'ignoreAnnotations' as a usable parameter but seems like it…
1
vote
0 answers

Setup sonarQube with PMD CPD for scala project

Trying to setup scala project with SonarQueue 5.6.6 for code duplication via PMD CPD through SBT. I did manage to run CPD via SBT to generate cpd.xml but just wandering how to propagate that info to Sonar. Anybody has any insight how it works? When…
jaksky
  • 3,305
  • 4
  • 35
  • 68
1
vote
0 answers

How to use phpcpd command to check one file with all others

I have below file structure ROOT_FOLDER | +-- FRAMEWORK_FOLDER | | | +-- FRAMEWORK_FILE1.php | +-- FRAMEWORK_FILE2.php | +-- FRAMEWORK_FILE3.php | \-- FRAMEWORK_FILE4.php | +-- MY_FOLDER | | | +-- MY_FILE.php Now I…
Ankur Bhadania
  • 4,123
  • 1
  • 23
  • 38
1
vote
0 answers

Usage of CPD(Copy Paste Detector)

test.c void fun(){ printf("int main char"); } int main() { printf("int main int"); } ->Im giving the command like this run.sh cpd --minimum-tokens 5 --files /opt/test.c --language c and the output is as follows : Found a 2 line (5 tokens)…
Anvesh
  • 11
  • 3
1
vote
1 answer

How to change PMD's Copy Paste Detector (CPD) report output

I'd like to modify CPD to only spit out the Found a X line (Y tokens) duplication in the following files: ... when generating a report, i.e. suppress the source lines of code. I have the /src/ files and attempted to modify SimpleRenderer.java in…
mjswartz
  • 715
  • 1
  • 6
  • 19
1
vote
1 answer

How to avoid Whitespaces and comments in code duplication CPD tool

We are using CPD tool for code duplication detection. CPD tool includes whitespaces and comments. Could you please let us know how we can avoide white spaces, comments so that correct cases of duplicity can come? Suppose we have 4 lines of…
1
vote
2 answers

CPDTask has a NoClassDefFoundError for FilenameUtils

After updating from PMD 5.0.3 to 5.0.5, I am getting a NoClassDefFoundError when trying to run CPD via ant. I see that CPD changed to use FilenameUtils, but that should not be a problem as I have commons-io.jar in the path for the task. Here is the…
1
vote
1 answer

CPD errors in a DAO class

I have a DAO class which has multiple methods.In each method I used variable name "result" for ResultSet and "statement" for PreparedStatement and a closeResources() method to close the PreparedStatement and the Connection. I have used a DataManager…
saiki4116
  • 323
  • 1
  • 4
  • 14
0
votes
0 answers

JavaScript/TypeScript Code Duplication (JSCPD) Ignore Imports

Some background: I am developing two main projects. The first project is an API written in TypeScript using NodeJS. The second project is also written in TypeScript, using Ionic Framework to develop a hybrid mobile application. I recently added…
Viraj Shah
  • 754
  • 5
  • 19
0
votes
1 answer

gradle cpd plugin cpdCheck warning in multi module gradle project how to avoid it

I have multi module gradle library project; no application module. I replicated this scenario by creating gradle init library project. The idea is to have convention plugin to handle cpd,pmd, spotBugs and other checks, and apply to each modules as…
user1986244
  • 259
  • 2
  • 12
0
votes
1 answer

PMD Failure: ILogin:73 Rule:ConstantsInInterface Priority:3 Avoid constants in interfaces

Can someone tell me how to exclude some interfaces from PMD analysis using maven. I am getting the below exception while making the maven build. PMD Failure: ILogin$RetrieveLoginInfo_:4 Rule:ConstantsInInterface Priority:3 Avoid constants in…
Sunil
  • 101
  • 1
  • 10
0
votes
1 answer

How can I exclude blocks in XML files using CPD?

In a large project multiple android resources are used. It now happens that there are resources copied. I want to detect these copies using CPD. Currently I'm using the following command: ./run.sh cpd --language xml --minimum-tokens 20 --files…
0
votes
1 answer

How to generate xml report using CPD (Copy Paste Detector)?

I am using the CPD tool to find the Duplicate codes in my project. I have tried the command line options as given in this link CPD Usage. I want to generate the report in xml format and need to store it in a particular location. But it is showing…
0
votes
2 answers

SonarQube CPD detection on custom plugin

I have developped a custom plugin for SonarQube (C#, Powerbuilder, etc) The native CPD Sensor from SonarQube doesn't perform the "Cut and Paste Detection" Is there a special configuration for this ? Thanks
D Cruette
  • 31
  • 5
0
votes
1 answer

c++: Undefined reference to ERROR

I want to add new c++ library of cpd (https://github.com/gadomski/cpd) to one project in ROS. I have already successfully installed the cpd library in my Ubuntu OS. Now I want to use it under ROS environment. In the CMakeList.txt file, I already…
ZYJ
  • 57
  • 6