5

I have a small project of mine (please keep in mind that I'm just a python beginner). This project consists of few smaller .py files. First there is main.py that looks like this:

from Controller import Controller
import config as cfg

if __name__ == "__main__":
    for path in cfg.paths.values():
        if not os.path.exists(path):
            os.system(f"mkdir {path} -p")
    Con = Controller()
    Con.start()

so this program just creates some directories, creates Controller object and run its method. Controller.py loks like this:

import multiprocessing
import watchdog
import watchdog.events
import watchdog.observers
import watchdog.utils.dirsnapshot
import concurrent.futures
from AM import AM
import config as cfg

m = multiprocessing.Manager()
q = m.Queue()

class Handler(watchdog.events.PatternMatchingEventHandler):
    def __init__(self):
        # Set the patterns for PatternMatchingEventHandler
        watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['TEST*'],
                                                            ignore_directories=True, case_sensitive=False)

    def on_created(self, event):
        logging.info("AM Watchdog received created event - % s." % event.src_path)
        q.put(event.src_path)

    def on_moved(self, event):
        logging.info("AM Watchdog received modified event - % s." % event.src_path)
        q.put(event.src_path)

class Controller:
    def __init__(self):
        pass

    def _start_func(self, newFname):
        try:
            res = AM(f"{newFname}").start()
            return res
        except:
            return 1

    def start(self):
        event_handler = Handler()
        observer = watchdog.observers.Observer()
        observer.schedule(event_handler, path=cfg.paths["ipath"], recursive=True)
        observer.start()
        
        try:
            while True:
                time.sleep(1)
                with concurrent.futures.ThreadPoolExecutor(max_workers=cfg.workers) as executor:
                    futures = {}
                    while not q.empty():
                        newFname = q.get()
                        futures_to_work = executor.submit(self._start_func, newFname)
                        futures[futures_to_work] = newFname

                    for future in concurrent.futures.as_completed(futures):
                        name = futures.pop(future)
                        print(f"{name} completed")
                    
        except KeyboardInterrupt:
            observer.stop()
        observer.join()

This program is more complex than last one (and there is probalby few things wrong with it). Its purpose is to observe a directory (cfg.paths["ipath"]) and wait for TEST* file to appear. When it does, its name is added to queue. When queue is not empty a new future from concurrent.futures.ThreadPoolExecutor is created, the name is passed to _start_func method. This methods create a new object from AM.py and run it. Thought process behind it was that I wanted a program that wait for a TEST* file to appear and then do some operations on it, while being able to work on multiple files at once and working on them in the order they appear. AM.py looks like this:

import subprocess

class AM():
    def __init__(self, fname):
        pass
   
    def test_func(self, fname):
        c = f"some_faulty_unix_program {fname}".split(" ")
        p = subprocess.run(c, capture_output=True, text = True)

        out, err = p.stdout, p.stderr
        if out:
            print(out)
        if err:
            print(err)
            return 1
        return 0

    def start(self, fname):
        res = self.test_func(fname)
        return res

This program is running some unix program (on file detected in Controller.py) in new process. This program will produce errors more often than not (due to TEST* files not always being valid). I don't think it matters what this program is, but just in case this program is solve-field from astrometry.net and TEST* files are images of sky.

This whole project is being ran as a service, like this:

[Unit]
Description = test astrometry service
After = network.target

[Service]
Type = simple
ExecStart = /bin/bash -c "/home/project/main.py"
Restart = always
RestartSec = 2
TimeoutStartSec = infinity
User = root
Group = users
PrivateTmp = true
Environment = "PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/home/project"
[Install]
WantedBy = multi-user.target

When I enable this service and check it with systemctl status my_project.service it takes about 76.0M memory. I tend to leave this working for whole night (I have a system that takes a night-sky photo every minute, this project is for calculating astrometry of this night-sky photo). Next morning, when I test with systemctl status memory is around ~200-300M if there was no errors and ~3.5G if something went wrong (for example, I moved a config file used by this unix program so it produces an error right at the start). Why memory increases like that? Is there something wrong in my code that casues it, or there is something wrong with this unix program?

jlipinski
  • 152
  • 12

1 Answers1

1

It's not clear to me where the memory leak is occurring. If it's in "some_faulty_unix_program" being run in AM.test_func, then you would need to find or create a replacement for it. But I believe there are some simplifications/optimizations that could be made to the code to reduce the likelihood of a memory leak if it is occurring elsewhere.

First, I don't think that you need to be re-creating the multithreading pool over and over. It also appears that watchdog uses multithreading so you could use the more efficient queue.Queue instance instead of a managed queue. But ultimately I would think that with some refactoring of the Controller.py code so that your handler submits tasks to the multithreading pool you could eliminate an explicit queue altogether. Would the following work?

import concurrent.futures
from threading import Event

import watchdog
import watchdog.events
import watchdog.observers
import watchdog.utils.dirsnapshot
from AM import AM
import config as cfg

class Handler(watchdog.events.PatternMatchingEventHandler):
    def __init__(self):
        # Create multithreading pool just once:
        self._executor = concurrent.futures.ThreadPoolExecutor(max_workers=cfg.workers)
        # Set the patterns for PatternMatchingEventHandler
        watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['TEST*'],
                                                            ignore_directories=True, case_sensitive=False)
    def on_created(self, event):
        logging.info("AM Watchdog received created event - %s.", event.src_path)
        self._run_start_func(event.src_path)

    def on_moved(self, event):
        logging.info("AM Watchdog received modified event - %s.", event.src_path)
        self._run_start_func(event.src_path)

    def _run_start_func(self, newFname):
        future = self._executor.submit(self._start_func, newFname)
        future.result() # Wait for completion
        print(f"{newFname} completed")
        
    def _start_func(self, newFname):
        try:
            res = AM(newFname).start()
            return res
        except:
            return 1

class Controller:
    def __init__(self):
        pass

    def start(self):
        event_handler = Handler()
        observer = watchdog.observers.Observer()
        observer.schedule(event_handler, path=cfg.paths["ipath"], recursive=True)
        observer.start()
        event = Event()
        try:
            # Block until keyboard interrupt:
            event.wait()
        except KeyboardInterrupt:
            observer.stop()
            observer.join()
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Thank you very much. After some extensive testing I'm pretty certain that this unix program is not clearing memory correctly during exceptions. I will work on finding solution, but my workaround for now is to set ```MemoryMax``` in my service, so that service resets when there is memory leak. But your code looks much cleaner than mine, thank you very much for that! – jlipinski Aug 08 '23 at 08:07