0

I am using codeql TaintTracking and I noticed by default it does not follow data for functions it doesn't know.

for exapmple for this code:

import pd

a = src + anything
df = pd.DataFrame(a)

if src is the source, then a is defined as a sink (as expected) but df isn't.

I want to arrive to any "contaminated" variable, including df. Any ideas how to do that? I saw the documentation for overriding isAdditionalTaintStep in TaintTracking::Configuration which seems like a good direction but I only found examples of it crossing a specific function, and not any value assignment by any function (which I believe can be useful to many cases)

An

Atlantis
  • 592
  • 5
  • 23

1 Answers1

0

Did it :-)

implementation for anyone interested:

class TaintFromParameters extends TaintTracking::Configuration {
    TaintFromParameters() { this = "TaintFromParameters" }

    override predicate isSource(DataFlow::Node source) {
        // define your sources here
    }

    override predicate isSink(DataFlow::Node sink) {
        // define your sinks here
    }

    override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
        exists(DataFlow::CallCfgNode call | 
            node2 = call and 
            node1 = call.getArg(_)
        )
    }
}

works amazing

Atlantis
  • 592
  • 5
  • 23
  • 1
    You might want to refine `isAdditionalTaintStep` a bit though; currently it seems to add taint from an argument to the return value of a function call for _any_ function. Depending on your use case that could probably lead to a lot of false positives for functions where the return value is unrelated to one (or all) of the arguments. – Marcono1234 Jul 17 '23 at 19:28
  • @Marcono1234 That was my case though, as I dont want to list the functions which are elligible, but warn for evey tract that data "might" flow through and go through the warnings by hand – Atlantis Jul 19 '23 at 13:18