I have a project where i try disassemble with the help of angr a bunch of executables but i have a memory leak. This is the main function where i have a while like this:
def main():
mypath = Path("/home/baroj/Thesis/smart_obfuscation_generator/MalwareDir")
binaries = [join(mypath, f) for f in listdir(mypath) if
isfile(join(mypath, f)) and '_patched' not in f.__str__()]
while binaries:
binary = binaries.pop(0)
print(
f"TIME: {time.asctime(time.localtime(time.time()))} - Starting with binary: {os.path.basename(binary)}\n")
try:
gmm = GeneticMalwareModifier(binary.__str__())
except Exception as e:
print(traceback.format_exc())
print(f"An error occurred while reading this binary... (see log {binary}))\n")
continue
etc...
The GeneticMalwareModifier init is like this:
class GeneticMalwareModifier:
def __init__(self, input_file_path, population_size=40, crossover_probability=0.8, mutation_probability=0.15,
ngen=7, min_actions=1, max_actions=15):
self.input_file_path = input_file_path
self.max_actions = max_actions
self.min_actions = min_actions
self.cfg = binary_analyzer.make_cfg(self.input_file_path)
self.code = binary_analyzer.make_code_dict(self.input_file_path, self.cfg)
self.functions = binary_analyzer.build_function_objects(self.cfg, self.code)
not_imported_functions_list = [f for f in self.functions.values() if f.address_to_instruction_dictionary]
self.not_imported_functions = {f.address: f for f in not_imported_functions_list}
self.levels = function.classify_functions(self.not_imported_functions)
function.analyze_functions(self.not_imported_functions, self.levels)
self.randomizable_functions = [f for f in self.not_imported_functions.values() if "_SEH_"
etc...
binary_analyzer.make_cfg:
def make_cfg(file_path):
angr_project = angr.Project(file_path)
cfg = angr_project.analyses.CFGEmulated()
return cfg
Usually i catch errors coming from self.cfg = binary_analyzer.make_cfg(self.input_file_path)
but that's not a problem and the execution resume with another file. The problem is that it seems that angr keeps some references which causes memory leaks.
I want to add that currently this program doesn't go further than self.cfg = binary_analyzer.make_cfg(self.input_file_path)
.
I used memory_profiler, tracemalloc and heapy/guppy and the problem seems to be angr.
Whenever it starts to read an executable and maybe get an error it cause a huge memory leak.
This is heapy/guppy before/after gmm = gmm = GeneticMalwareModifier(binary.__str__())
:
Heap Status After Creating Few Objects :
Heap Size : 109499491 bytes
Partition of a set of 1039344 objects. Total size = 109499491 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 120515 12 27959480 26 27959480 26 dict of angr.sim_type.SimTypePointer
1 105025 10 19799760 18 47759240 44 dict (no owner)
2 124235 12 12920440 12 60679680 55 dict of angr.sim_type.SimTypeInt
3 49188 5 7083072 6 67762752 62 dict of angr.sim_type.SimStruct
4 124235 12 5963280 5 73726032 67 angr.sim_type.SimTypeInt
5 120515 12 5784720 5 79510752 73 angr.sim_type.SimTypePointer
6 49904 5 5190272 5 84701024 77 dict of angr.sim_type.SimTypeChar
7 37824 4 3238976 3 87940000 80 list
8 29947 3 3114488 3 91054488 83 dict of angr.sim_type.SimTypeBottom
9 28830 3 2557378 2 93611866 85 str
Once it reads one executable this stuff remains on the heap forever and it grows. This is some iterations later:
Partition of a set of 1220070 objects. Total size = 131637306 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 120578 10 27982160 21 27982160 21 dict of angr.sim_type.SimTypePointer
1 105137 9 19961560 15 47943720 36 dict (no owner)
2 124417 10 12939368 10 60883088 46 dict of angr.sim_type.SimTypeInt
3 42652 3 9212832 7 70095920 53 frozenset
4 49250 4 7092000 5 77187920 59 dict of angr.sim_type.SimStruct
5 124417 10 5972016 5 83159936 63 angr.sim_type.SimTypeInt
6 120578 10 5787744 4 88947680 68 angr.sim_type.SimTypePointer
7 49929 4 5192872 4 94140552 72 dict of angr.sim_type.SimTypeChar
8 38065 3 3256696 2 97397248 74 list
9 14598 1 3185936 2 100583184 76 set