30

I m in process of understanding and building a static code analysis tool for a proprietary language from a big company. Reason for doing this , I have to review a rather large code base , and a static code analysis would help a lot and they do not have one for the language so far.

I would like to know how does one go about building a static code analysis tool , for e.g. Lint or SpLint for C.

Any books, articles , blogs , sites..etc would help.

Thanks.

Jeff Atwood
  • 63,320
  • 48
  • 150
  • 153
codeanalyser
  • 341
  • 1
  • 3
  • 6

4 Answers4

7

I know this is an old post, but the answers don't really seem that satisfactory. This article is a pretty good introduction to the technology behind the static analysis tools, and has several links to examples.

A good book is "Secure Programming with Static Analysis" by Brian Chest and Jacob West.

Community
  • 1
  • 1
Tony Richards
  • 391
  • 3
  • 5
4

You need good infrastructrure, such as a parser, a tree builder, tree analyzers, symbol table builders, flow analyzers, and then to get on with your specific task you need to code specific checks for the specific problems of interest to you, using all the infrastructure machinery.

Building all that foundation machinery is actually pretty hard, and it doesn't help you do your specific task. People don't write the operating system for every application they code; why should you build all the infrastructure? Like an OS, it is better if you simply acquire good infrastructure.

People will tell you to lex and yacc. That's kind of like suggesting you use the real time keneral part of the OS; useful, but far from all the infrastructure you really need.

Our DMS Software Reengineering Toolkit provides all the necessary infracture. It has been used to define many language front ends as well as many tools for such languages.

Such infrastructure would allow you to define your specific nonstandard language relatively quickly, and then get on with your task of coding your special checks.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • DMSToolkit reminds me of one such other Language Manipulation framework called Soot (http://sable.github.io/soot/). Like DMSToolkit, Soot also exposes Call Flow Graphs, Control Flow Graphs, etc. And the most interesting thing about it is the exposure of Intermediate Representation. – sumitb.mdi Nov 23 '15 at 22:09
  • 1
    Soot is specific to Java (and IIRC, focused on class files rather than source code). As such, one can define an "intermediate representation" specific for the task. DMS is much more general; it operates on source code (our Java domain also reads class files). Instead of one fixed intermediate representation, DMS defines AST and data flow schemas, which are instantiated on a per-language basis. – Ira Baxter Nov 23 '15 at 22:34
  • @Baxter: Yeah I agree with you over that, Soot is specifically for Java. Just that the tool resembled very much with DMS so I though its worth to mention it. – sumitb.mdi Nov 24 '15 at 08:58
2

There is a blog by DeepSource that covers everything one needs to know to build an understanding of static code analysis and equip you with the basic theory and the right tools so that you can write analyzers on your own.

Here’s the link: https://deepsource.io/blog/introduction-static-code-analysis/

camelcaseguy
  • 71
  • 1
  • 6
0
  1. Obviously you need a parser for the language. A good high level AST is useful.
  2. You need to enumerate a set of "mistakes" in the language. Without knowing more about the language in question, we can't help here. Examples: unallocated pointers in C, etc.
  3. Combine the AST with the mistakes in #2.
Yann Ramin
  • 32,895
  • 3
  • 59
  • 82
  • How does one go about find mistakes using the AST. – codeanalyser Dec 17 '10 at 06:46
  • 2
    @codeanalyser: Pretty much you don't (or you chase trivial ones). You need lots more than just the AST for any kind of interesting analysis. See www.semanticdesigns.com/Products/DMS/LifeAfterParsing.html. – Ira Baxter Jul 03 '12 at 07:29