COBOL clone detection with ConQAT?

Question

ConQAT's doc claims it can do clone detection on COBOL code, but I can't find any appropriate block in the list of Included blocks.

The only one that could be considered is StatementCloneAnalysis but it would get confused by the line numbers that precede each line:

016300******************************************************************0058

score 2 · Accepted Answer · answered Feb 24 '12 at 17:20

Interesting tool. I took a quick look and it seems to me that a simple fix might be to pre-process COBOL source to overwrite columns 1 through 6 with spaces and trim everything after column 72.

After poking around for a while I came across the NextToken scanner definition file for COBOL. It looks like it will "happily" pick up tokens from the sequence number area as well as after column 72. The tokenizer looks like it only deals with COBOL source code after it has gone through the library processing phase of a compile (i.e. after compiler directives such as COPY/REPLACE have been processed). COPY/REPLACE were specified as keywords but I really don't see how this tokenizer would deal with them properly - particularly where pseudo text is involved.

If working with an IBM COBOL compiler, you can specifying the MDECK option on a compile to generate a suitable source file for analysis. I am not familiar with other vendors so cannot comment further on how to generate a post text-manipulation source deck.

The level of clone detection conquat provides for COBOL appears to be very limited relative to other languages (e.g. java). I suspect you will have to put in a lot of hours to get anything more than trivial clone detection out of it for COBOL programs. However this could be a very useful project given the heavy use of cut/paste coding in typical COBOL programs (COBOL programmers often make a joke out of it: Only one COBOL program has ever been written, the rest are just modified copies of it). I wish you well.

score 0 · Answer 2 · edited May 23 '17 at 12:27

Given that ConQat deals with COBOL badly, you might look at our CloneDR tool.

It has a version that works explicitly with IBM Enterprise COBOL, using a precise parser, and it handles all that sequence number nonsense correctly. (It will even read the COBOL code in its native ECBDIC, meaning a literal string containing an ASCII newline character doesn't break the parser). [If your COBOL isn't IBM COBOL, this won't help you, but otherwise you won't "have to put a lot of hours to get anything"].

We think the AST-based detection technique detects better clones more accurately than ConQat's token-based detection. The site explains why in detail, and shows sample COBOL clones detected by CloneDR.

Specific to the OP who appears to be working in Japan: as a bonus, CloneDR handles Japanese character sets because it is implemented on top of an underlying tool infrastructure that is Unicode and Shift-JIS enabled. We haven't had a lot of experience with Japanese COBOL so there might be a remaining glitch; see G literals with Japanese characters.

COBOL clone detection with ConQAT?

2 Answers2