7

Say there are two methods in my library:

void com.somepackage.SomeClass.someSink(String s)

and

int com.someotherpackage.SomeOtherClass.someSource(int i)

The first method is used as a data sink, while the second as a data source in my code. The type parameters int, String are just given as an example and may change in the actual situation.

I want to detect the usage of these methods in some code that satisfy a certain pattern given below:

  1. some data (say x) is generated by the source
  2. some data (say y) is generated using a series of transformations f1(f2(... fn(x))
  3. y is given to the sink.

The transformations can be any arbitrary functions as long as there is a sequence of calls from the function that generates the data for the sink to a function that takes in data from the source. The functions may take any other parameters as well and are to be used as a black-box.

The scanning can be at the source or bytecode level. What are the tools available out there for this type of analysis?

Prefer non-IDE based tools with Java APIs.

[EDIT:] to clarify more, someSink and someSource are arbitrary methods names in classes SomeSome and SomeOtherClass respectively. They may or may not be static and may take arbitrary number of parameters (which I should be able to define). The type of the parameters is also not arbitrary. The only requirement is that the tool should scan the code and output line numbers where the pattern occurs. So the tool might work this way:

  • Obtain sink and source names (fully qualified name of class and method name) from user.
  • Statically scan the code and find all places where the given sink and source are used
  • Check if a path exists where some data output by source is given to sink either directly or indirectly via a series of operations (operators, methods).
  • Ignore those sources/sinks where no such path exists and output the remaining ones (if any).

Example output:

MyClass1.java:12: value1 = com.someotherpackage.SomeOtherClass.someSource(...)
MyClass2.java:23: value2 = foo(value1, ...)
MyClass3.java:3: value3 = bar(value2)
MyClass4.java:22: com.somepackage.SomeClass.someSink(value3, ...)

Note: If a function does not take parameters but has some side affects on the data also needs to be considered. (Example a = source(); void foo(){ c = a+b }; foo(); sink(c) is a pattern that needs to be caught.)

Jus12
  • 17,824
  • 28
  • 99
  • 157
  • Are you describing what a UML sequence diagram is? If yes, then there are plenty of tools (mostly commercial) for doing this. – mazaneicha May 06 '12 at 21:24
  • It is a subset of sequence diagram that satisfies the `data-dependency` criteria. – Jus12 May 07 '12 at 06:01
  • So all you really want is that the 2nd class has some indirect data dependency on the first? – Ira Baxter May 07 '12 at 06:58
  • @IraBaxter that is correct. Rather than write this from scratch, I am hoping that existing static analysis tools are extensible enough for me to specify this in a query or something similar. – Jus12 May 07 '12 at 07:51

2 Answers2

4

After doing some research, I find that soot is the best suited for this kind of task. Soot is more mature than other open source alternatives such as PQL.

Jus12
  • 17,824
  • 28
  • 99
  • 157
2

So the role of the source and sink methods is simply that x originates in the source method (somewhere) and is consumed (somewhere) in the target method? How do you characterize "x", or do you simply want all x that have this property?

Assuming you have identified a specific x in the source method, do you a) insist that x be passed to the target method only by method calls [which would make the target method the last call in your chain of calls], or can one of the intermediate values be copied? b) insist that each function call has exactly one argument?

We have done something like this for large C systems. The problem was to trace an assigned variable into a use in other functions whereever they might be, including values not identical in representation but identical in intent ("abstract copy"; the string "1.0" is abstractly equivalent to the integer 1 if I use the string eventually as a number; "int_to_string" is an "abstract copy" function that converts a value in one representation to an equivalent value in another.).

What we needed for this is a reaching definitions analysis for each function ("where does the value from a specific assignment go?"), and an "abstract copy" reaching analysis that determines where a reaching value is consumed by special functions tagged as "abstract copies", and where the result of that abstact copy function reaches to. Then a transitive closure of "x reaches z" and "x reaches f(x) reaches z" computed where x can go.

We did this using our DMS Software Reengineering Toolkit, which provides generic parsing and flow analysis machinery, and DMS's C Front End, which implements the specific reaching and abstract-copy-reaching computations for C. DMS has a Java Front End which computes reaching definitions; one would have add the abstact-copy-reaching logic and reimplement the transitive closure code.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Each function called may have any number of parameters. The return type can also be anything. I have updated my question. Please let me know if this clarifies it. – Jus12 May 07 '12 at 06:16
  • Is there an alternative to DMS toolkit? Preferably something open-source. – Jus12 Jun 19 '12 at 05:30
  • Well, there's the Java compiler API, and there's Wala. These may have some flow analysis capabilities. But I think you want flow analysis across methods in classes, and I'm not sure how much support they provide for that. I offer DMS because it is what I know, and because we have seen this kind of problem before and its what we intend for a tool like DMS to support. – Ira Baxter Jun 19 '12 at 06:32
  • I would like to try out the DMS toolkit if there is a trial version available that lets me do this. – Jus12 Jun 19 '12 at 14:39