0

I have a situation where one processes locks a particular file with Python's fcntl module, by doing lockf(fd, LOCK_SH). I have another process SIGKILL it, wait for that process to die, and then more or less immediately lockf(fd, LOCK_EX | LOCK_NB) the same file.

Something like this:

import os
import fcntl
import time
import sys
import signal

pid = os.fork()
if pid:
    time.sleep(1)
    os.kill(pid, signal.SIGKILL)
    os.waitpid(pid, 0)
    fcntl.lockf(os.open(sys.argv[0], os.O_WRONLY), fcntl.LOCK_EX | fcntl.LOCK_NB)
else:
    fcntl.lockf(os.open(sys.argv[0], os.O_RDONLY), fcntl.LOCK_SH)
    time.sleep(1000)

The file is on a normal ext4 local filesystem, and all the processes are on the same machine, so bad lock implementations on weird filesystems are not a concern here.

I want to know if it is possible in theory for the second lock to fail, given sufficiently pathological scheduling.

  1. Does POSIX guarantee that a process is truly dead and buried when you wait on it and get its exit code? Or can some cleanup for the process still be pending?
  2. Does POSIX guarantee any sort of happens-before ordering between the death of a process and the resolution of locks it was holding at death?
  3. What is the actual behavior of the current Linux kernel with regard to ordering process death and file unlocking when a process is killed?
interfect
  • 2,665
  • 1
  • 20
  • 35
  • To me it is not clear what problem you want to solve. Your question looks like an XY problem. Showing the code of a [mre] with the expected and actual result might help to understand your problem. Why don't you simply use `kill(pid, 0)` or `waitpid` to check if a specific process exists. – Bodo Apr 14 '21 at 20:08
  • I'm not polling for the child processes by PID because the test code doesn't know the PIDs that were used, and because that would create a PID re-use risk that I would rather avoid if I can. There's a lot of moving parts in an example that might reproduce the behavior I am worried might be possible under the spec, and I'm not reliably able to reproduce it on my local machine even with my whole program. So I'm interested in what is possible in theory. – interfect Apr 14 '21 at 20:21
  • Please [edit] your question to add clarification or requested information instead of answering in comments. Please describe your use case. Do you have processes that are assigned to different sets? Do you have other child processes you don't want to check? Without knowing what you want to achieve it is difficult to suggest a solution. Maybe you could use process groups and `waitpid(- groupID, ...)` to check if you have any unwaited-for children from a specific group. See also https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem – Bodo Apr 14 '21 at 20:32
  • Are you using the wrong function? `LOCK_SH`, `LOCK_EX`, and `LOCK_NB` are for `flock()`, but you're using `lockf()`, which uses `F_LOCK` and `F_TLOCK` and doesn't have a concept of a shared lock. – Joseph Sible-Reinstate Monica Apr 15 '21 at 00:05
  • I wrote a test that kept doing what you described in a loop, but even after quite a while couldn't reproduce what you're seeing. Can you post your actual test code so we can see if there's a mistake in it? – Joseph Sible-Reinstate Monica Apr 15 '21 at 01:28
  • Sorry @JosephSible-ReinstateMonica; I was speaking in Python and it looks like Python's names for the constants are different than what they are in C. I've edited the question to make that clearer. I've also worked out that I need some additional process group related machinery and waits for unrelated reasons that probably will solve this problem for me in practice. But I still want to know when locks are cleaned up relative to the visibility of the death of a process. – interfect Apr 15 '21 at 17:22

0 Answers0