0

I have this simple Test class, where I'm trying to mimic a path problem. with Message1 as source and Console.WriteLine(Message1) as sink.

 class Test
    {
        private const string Message1 = "Test Message 1";
        private readonly string Message2 = "Test Message 2";

        public void Run()
        {
            Console.WriteLine(Message1);
            Console.WriteLine(Message2);
        }
    }
import csharp
import DataFlow::PathGraph

class Source extends DataFlow::Node {
  Source() { this.asExpr() instanceof StringLiteral }
}

class WriteLineMethod extends Method {
  WriteLineMethod() { this.hasQualifiedName("System.Console.WriteLine") }
}

class Sink extends DataFlow::Node {
  Sink() {
    exists(MethodCall m |
      m.getTarget() instanceof WriteLineMethod and
      this.asExpr() = m.getArgument(0)
    )
  }
}

class SimpleConfiguration extends TaintTracking::Configuration {
  SimpleConfiguration() { this = "Simple configuration" }

  override predicate isSource(DataFlow::Node source) { source instanceof Source }

  override predicate isSink(DataFlow::Node sink) { sink instanceof Sink }
  
}

from DataFlow::PathNode source, DataFlow::PathNode sink, SimpleConfiguration cfg
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "$@ is used in WriteLine method.", source.getNode(),
  "String"

Here I did simple taint tracking analysis on above Test class using above query, for const field private const string Message1 = "Test Message 1"; as source and Console.WriteLine(Message1); as sink i'm getting correct result, But for non const field private readonly string Message2 = "Test Message 2"; it doesn't seem to work.

Did i miss something here? why does taint tracking works for const field or static field in static class but not for instance field?

test.ql result

Database zip file: codeql-test-database

Edit:

I have created an issue for same at github/codeql below is an link for reference.

github/codeql/issues/9569

Manu Nair
  • 70
  • 2
  • 4
  • 1
    Might be good to mention that you have created a [GitHub issue](https://github.com/github/codeql/issues/9569) for this as well. And in case you get an answer there, would be good to add an answer here as well, referring to that answer on GitHub (or vice versa). – Marcono1234 Jun 16 '22 at 21:11

1 Answers1

0

Quoting this detailed analysis by tamasvajk and here is the link to github discussion answer.

This is a great question, I had to dig into CFG and DF implementation details for the answer, and I'm not sure I have the full picture yet, but hopefully the below answers your question.

The CodeQL C# library handles instance fields differently than static or const fields. Initializers of instance fields are moved into instance constructors when the CFG is constructed. So the following sample

class C 
{
  int x = 42;

  C() { }
}

is modelled as the below in the CFG

class C
{
  int x;

  C() 
  {
    this.x = 42;
  }
}

I'm not sure about the reasons why it was implemented like this, but I assume this lets us do precise dataflow analysis in case of inheritance and virtual dispatch. Also, it reduces noise: in your sample, there's actually no code path that would print Test Message 2, because there's no instance of Test on which Run() is invoked.

Based on the above CFG model, in order to find the missing flow, we need to add some code that includes the initialization and a call to Run(). So for example adding the following constructor

Test()
{
  Run();
}

works, because it is transformed to

Test()
{
  this.Message2 = "Test Message 2";
  this.Run();
}

and there's field based dataflow from the string literal to WriteLine on the this instance.

Another way to introduce the missing flow would be adding the following:

static void M()
{
  var t = new Test();
  t.Run();
}

public Test() { }

In this case CodeQL finds the flow for instance t. (Note that adding the empty constructor should not be necessary, but due to a bug in the CFG construction, the flow is not found without it. I submitted an internal issue to fix this.)

This is a great question, I had to dig into CFG and DF implementation details for the answer, and I'm not sure I have the full picture yet, but hopefully the below answers your question.

The CodeQL C# library handles instance fields differently than static or const fields. Initializers of instance fields are moved into instance constructors when the CFG is constructed. So the following sample

class C 
{
  int x = 42;

  C() { }
}

is modelled as the below in the CFG

class C
{
  int x;

  C() 
  {
    this.x = 42;
  }
}

I'm not sure about the reasons why it was implemented like this, but I assume this lets us do precise dataflow analysis in case of inheritance and virtual dispatch. Also, it reduces noise: in your sample, there's actually no code path that would print Test Message 2, because there's no instance of Test on which Run() is invoked.

Based on the above CFG model, in order to find the missing flow, we need to add some code that includes the initialization and a call to Run(). So for example adding the following constructor

Test()
{
  Run();
}

works, because it is transformed to

Test()
{
  this.Message2 = "Test Message 2";
  this.Run();
}

and there's field based dataflow from the string literal to WriteLine on the this instance.

Another way to introduce the missing flow would be adding the following:

static void M()
{
  var t = new Test();
  t.Run();
}

public Test() { }

In this case CodeQL finds the flow for instance t. (Note that adding the empty constructor should not be necessary, but due to a bug in the CFG construction, the flow is not found without it. I submitted an internal issue to fix this.)

Manu Nair
  • 70
  • 2
  • 4