I am writing a server system that runs in the background. In very simplified terms it has its own scripting language, which means that a process can be written in that language to run on its own, or it can call another process, etc. I am converting this system from a trivial PHP cron-job in which only one instance is permitted at a time to a set of long-running processes managed by Supervisor.
With that in mind, I am aware that these processes can be killed at any time, either by myself in development, or perhaps by Supervisord in the normal course of stopping or restarting a worker. I would like to add some proper signal handling to ensure that workers tidy up after themselves, and log where a task was left in an interrupted state where appropriate.
I have worked out how to enable signal handling using ticks and pcntl_signal()
, and my handling currently seems to work OK. However, I would like to test this to make sure it is reliable. I have written some early integration tests but they don't feel all that solid, mainly because during development there were all sorts of weird race-condition issues that were tricksy to pin down.
I'd like some advice or direction on how to send kill signals in PHPUnit tests, with a view to improving confidence that my sig handling is robust. My present strategy:
- Uses PHPUnit
- As the core system runs it creates log files of various kinds, which can be used to monitor when to kill off the task
- The core system is launched using a separate PHP script in the background using a
system()
command in the PHPUnit test. My command is similar tophp script.php > $logFile 2>&1 &
i.e. redirect all output to a log file and then push it to the background, so the test method can monitor it - The background script writes its PID to a file, which will be the PID to kill
- This is picked up reliably by the test by scanning repeatedly for it and
usleep
ing between scans - The test then waits for a specific state by scanning the database,
usleep
ing between scans, and issuing akill <pid>
when it is ready - It then waits for the signal handler to kick in and write a new database state,
usleep
ing again to avoid hammering the database - Finally it will either determine if the database is in a correct state or not, after a maximum delay time, which passes/fails a test.
Of course, with all this waiting/checking, it feels a bit ropey, and quite ripe for race conditions of all sorts. My current feeling is that the tests will fail around 2% of the time, but I've not been able to get the test to fail for a day or so. I plan to do some immersion testing, and if I get any failures from that I'll post that here.
I wonder if I can simplify it by asking the system under test to kill
itself, which will remove two levels of wait-checking (one to wait for the PID, and another to wait for the database to enter the correct state before the kill command)†. That would still leave the wait-check loop after the kill is issued, but I may yet find that having that one check is not a problem in practice.
That said, I am conscious that my whole approach may be ham-fisted, and there is a better approach to do this sort of thing. Any ideas? At present my thinking is just to increase my wait timeouts, in case PHPUnit is introducing any strange delays. I'll also see if I can get a failure case to examine the logs.
† Ah, sadly it won't simplify things much. I just tried this on an simple signal integration test I regard as reliable, and since the backgrounded system()
returns immediately, it still has to loop-wait to identify the right log record, and then for the right post-kill result. However, it no longer has to wait for a PID to be written to a temp file, so that is at least one loop eliminated.