0

I have a project where i try disassemble with the help of angr a bunch of executables but i have a memory leak. This is the main function where i have a while like this:

def main():
        mypath = Path("/home/baroj/Thesis/smart_obfuscation_generator/MalwareDir")
        binaries = [join(mypath, f) for f in listdir(mypath) if
                    isfile(join(mypath, f)) and '_patched' not in f.__str__()]

        while binaries:
            binary = binaries.pop(0)

            print(
                f"TIME: {time.asctime(time.localtime(time.time()))} - Starting with binary: {os.path.basename(binary)}\n")
            try:
                gmm = GeneticMalwareModifier(binary.__str__())
            except Exception as e:
                print(traceback.format_exc())
                print(f"An error occurred while reading this binary... (see log {binary}))\n")
                continue
        etc...

The GeneticMalwareModifier init is like this:

class GeneticMalwareModifier:
    def __init__(self, input_file_path, population_size=40, crossover_probability=0.8, mutation_probability=0.15,
                 ngen=7, min_actions=1, max_actions=15):
        self.input_file_path = input_file_path
        self.max_actions = max_actions
        self.min_actions = min_actions
        self.cfg = binary_analyzer.make_cfg(self.input_file_path)
        self.code = binary_analyzer.make_code_dict(self.input_file_path, self.cfg)
        self.functions = binary_analyzer.build_function_objects(self.cfg, self.code)
        not_imported_functions_list = [f for f in self.functions.values() if f.address_to_instruction_dictionary]
        self.not_imported_functions = {f.address: f for f in not_imported_functions_list}
        self.levels = function.classify_functions(self.not_imported_functions)
        function.analyze_functions(self.not_imported_functions, self.levels)
        self.randomizable_functions = [f for f in self.not_imported_functions.values() if "_SEH_" 

        etc...

binary_analyzer.make_cfg:

def make_cfg(file_path):
    angr_project = angr.Project(file_path)
    cfg = angr_project.analyses.CFGEmulated()
    return cfg

Usually i catch errors coming from self.cfg = binary_analyzer.make_cfg(self.input_file_path) but that's not a problem and the execution resume with another file. The problem is that it seems that angr keeps some references which causes memory leaks. I want to add that currently this program doesn't go further than self.cfg = binary_analyzer.make_cfg(self.input_file_path). I used memory_profiler, tracemalloc and heapy/guppy and the problem seems to be angr. Whenever it starts to read an executable and maybe get an error it cause a huge memory leak. This is heapy/guppy before/after gmm = gmm = GeneticMalwareModifier(binary.__str__()) :

Heap Status After Creating Few Objects : 
Heap Size :  109499491  bytes

Partition of a set of 1039344 objects. Total size = 109499491 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 120515  12 27959480  26  27959480  26 dict of angr.sim_type.SimTypePointer
     1 105025  10 19799760  18  47759240  44 dict (no owner)
     2 124235  12 12920440  12  60679680  55 dict of angr.sim_type.SimTypeInt
     3  49188   5  7083072   6  67762752  62 dict of angr.sim_type.SimStruct
     4 124235  12  5963280   5  73726032  67 angr.sim_type.SimTypeInt
     5 120515  12  5784720   5  79510752  73 angr.sim_type.SimTypePointer
     6  49904   5  5190272   5  84701024  77 dict of angr.sim_type.SimTypeChar
     7  37824   4  3238976   3  87940000  80 list
     8  29947   3  3114488   3  91054488  83 dict of angr.sim_type.SimTypeBottom
     9  28830   3  2557378   2  93611866  85 str

Once it reads one executable this stuff remains on the heap forever and it grows. This is some iterations later:


Partition of a set of 1220070 objects. Total size = 131637306 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 120578  10 27982160  21  27982160  21 dict of angr.sim_type.SimTypePointer
     1 105137   9 19961560  15  47943720  36 dict (no owner)
     2 124417  10 12939368  10  60883088  46 dict of angr.sim_type.SimTypeInt
     3  42652   3  9212832   7  70095920  53 frozenset
     4  49250   4  7092000   5  77187920  59 dict of angr.sim_type.SimStruct
     5 124417  10  5972016   5  83159936  63 angr.sim_type.SimTypeInt
     6 120578  10  5787744   4  88947680  68 angr.sim_type.SimTypePointer
     7  49929   4  5192872   4  94140552  72 dict of angr.sim_type.SimTypeChar
     8  38065   3  3256696   2  97397248  74 list
     9  14598   1  3185936   2 100583184  76 set
Baroj
  • 1

0 Answers0