0

In taint analysis, a taint source is a program location or statement that may produce an untrusted or external input.

My Goal : Identify all external user inputs to the program such as cmdline-input , file reading , environment and network variables using dynamic analysis(preferably) and propagate the taint.

I read this tutorial- http://shell-storm.org/blog/Taint-analysis-and-pattern-matching-with-Pin/ which intercepts read syscalls using Intel PIN and propagates the taint. I want to extend the same to include various external inputs mentioned as above.( To start out , for C - scanf , gets , fopen ,etc )

Is there any dynamic analysis tool which would help me with the identification of generic external inputs? Any other approaches with specific goals is also appreciated. Thanks

G Ashwin
  • 23
  • 1
  • 6

1 Answers1

0

I'm assuming you only target Linux.

In general, the way for a program to get outside inputs is through contact with the operating system, using system calls. You're talking about libc functions. This is a higher level of abstraction. I would recommend looking at inputs at the system call level.

Additionally, another input for the program is environment variables and command line arguments, that are found on the stack when the program starts.

One more thing to consider is shared memory in all its shapes and forms.

nitzanms
  • 1,786
  • 12
  • 35
  • The thing with system call approach is that I am not sure how to extend it to functions like scanf , as how scanf uses read system call is quite complicated . With that same PIN code, I get bizarre results if the program uses scanf instead of calling read() directly. So, to start out with , I thought I will mark all libc functions as well as a few functions that are domain specific (untrusted input) like a = f1() , Whenever f1 is called , mark a as tainted . Do you think static analysis tool such as clang is better for this case? – G Ashwin Mar 28 '16 at 09:00
  • I think you might be having issues with propagating the taint correctly but I can't tell for sure. – nitzanms Mar 29 '16 at 03:36
  • So , for the same use case can instrumentation tools like PIN be used to good effect to select certain libc functions in routine instrumentation ? In the other case a = f1() , how does one get the address of 'a' in PIN instrumentation? – G Ashwin Mar 29 '16 at 03:43
  • Pin doesn't work at source code level, so your question doesn't make sense. At the level in which pin works, this is answered in the article you linked to (this is the taint propagation). If you're asking about how to link the instruction where this happened with a source line, look at pin_getsourcelocation – nitzanms Apr 03 '16 at 01:40