I need to identify if a python program does some sort of encryption during its execution.
I have tried some approaches (I'll sort by difficulty level):
The source code of a program in python can have synonymous words that can identify some potential encryption being used (https://github.com/Wind-River/crypto-detector). The problem with this approach is that a slight obfuscation in the code may compromise identification.
In the code defined below, we have the variable string which contains all the code responsible for some encryption, but with this analysis we can't identify the encryption happening:
def handle (arg): from base64 import b64decode as printf string = 'ZnJvbSBDcnlwdG8uQ2lwaGVyIGltcG9ydCBBRVM7IGZyb20gQ3J5cHRvIGltcG9ydCBSYW5kb207IGtleSA9IGIn U2l4dGVlbiBieXRlIGtleSc7IGl2ID0gUmFuZG9tLm5ldygpLnJlYWQoQUVTLmJsb2NrX3NpemUpOyBjaXBoZXIgPSBBRVMubmV3KG tleSwgQUVTLk1PREVfQ0ZCLCBpdik7IG1zZyA9IGl2ICsgY2lwaGVyLmVuY3J5cHQoYidBdHRhY2sgYXQgZGF3bicpOyBwcmludCBt c2c = ' eval (compile (printf (string), '<string>', 'exec')) return None
Another code example: https://gist.github.com/robertonscjr/e3f658cce0c0253e2e076e0457635d86
This code does not contain words that can be associated to the use of an encryption, therefore this method is very little effective. The ineffectiveness of this tool is evident, as false positives can happen in a code that clearly does not do encryption:
def handle (arg): crypto = 2 return crypto
At the assembly level, any program's instructions may indicate the possible use of an encryption algorithm, as these algorithms usually have a pattern of instructions. The grap tool article - which is a low-level instruction pattern identification tool (https://eprint.iacr.org/2017/1119.pdf) - shows a low-level pattern of the execution steps of the AES.
For this context, this approach has an issue: the pattern matching made by the grap is done from the disassembly x86-x64 of any program, and the process of performing a x86-x64 disassembly for python implies generating a binary of one python program (which is natively interpreted).
I used a tool called pyinstaller (https://www.pyinstaller.org) to generate a binary from a slightly obfuscated python program (https://gist.github.com/robertonscjr/e3f658cce0c0253e2e076e0457635d86):
pyinstaller --onefile ~ / fake_aes.py
For the execution of the grap, i used the existing AES patterns located on grap directory, and the results were not promising, since the command described below did not result in any pattern matching:
grap -q ~ /grap/patterns/crypto/ ~/fake_aes.bin
One of suspicions is that the grap tool has not been successful because the binary transformation changes the behavior of how the instructions should be. Issue: The problem should be related with how Python binary was generated with pyinstaller and if the behavior of Python code is maintained in the generated binary?
If the pyinstaller generates the right binary and does not works, we have a second issue: we need to find another patterns to grep? the problem with this approach is the fact that find another pattern can be time consuming and not very effective agains serious obfuscation (https://youtu.be/3hSpmcoQ578?t=1999).