0

I have a C++ script, which checks whether any action has to be done and if so it starts the right processor C++ script. However, since it runs every x minutes it also checks whether the processor isn't still running using lock files.

I use the following function to acquire the lock:

int LockFile(string FileNameToLock) {
    FileNameToLock += ".lock";
    int fd = open(FileNameToLock.c_str(), O_RDWR | O_CREAT, 0666);
    int rc = flock(fd, LOCK_EX | LOCK_NB);
    if (rc || rc == -1) {
        cout << errno << endl;
        cout << strerror(errno) << endl;
        return -1;
        }
    return fd;
    }

The code that is being executed:

[...]
if (LockFile(ExecuteFileName, Extra) == -1) {
    cout << "Already running!" << endl; //MAIN IS ALREADY RUNNING
    //RETURNS ME Resource temporarily unavailable when processor is running from an earlier run
    exit(EXIT_SUCCESS);
    }
if (StartProcessor) { //PSEUDO
    int LockFileProcessor = LockFile("Processor");
    if (LockFileProcessor != -1) {
        string Command = "nohup setsid ./Processor"; //NOHUP CREATES ZOMBIE?
        Command += IntToString(result->getInt("Klantnummer"));
        Command += " > /dev/null 2>&1 &"; //DISCARD OUTPUT
        system(Command.c_str());
        //STARTS PROCESSOR (AS ZOMBIE?)
        }
    }

The first run works well, however when the main script runs again, the lock file returns -1, which means an error occurred (only when the processor is still running). The errno is 11 which results in the error message: Resource temporarily unavailable. Note that this only happens when the (zombie) processor is still running. (However, the main script has already terminated, which should close the file handle?)

For some reason, the zombie keeps the file handle to the lock file of the main script open???

I have no idea where to look for this problem.

SOLVED: see my answer

TVA van Hesteren
  • 1,031
  • 3
  • 20
  • 47
  • You could check whether your processes are still running or are zombie processes and stop them. I have no experience how this is done by a C++ program in an elegant way as I mostly did this on command line. – stefaanv Nov 03 '17 at 08:34
  • I don't want to kill the zombies. They have to continue. However, the main might need to start processor B while processor A is still running, which doesn't work because I can't obtain a lock for the main. I guess because processor A keeps the file handle to the lock file of the main open during execution because when I terminate processor A I can obtain a new lock on the main – TVA van Hesteren Nov 03 '17 at 08:37
  • I think you are confusing zombie process and orphan process, see https://en.wikipedia.org/wiki/Zombie_process and https://en.wikipedia.org/wiki/Orphan_process – stefaanv Nov 03 '17 at 08:50
  • Great comment, that is indeed a better explanation of the process I want to create! – TVA van Hesteren Nov 03 '17 at 08:55
  • By the way, https://linux.die.net/man/2/flock has an intriguing line: "Locks created by flock() are preserved across an execve(2).". To be honest, I don't know what it exactly means and whether this applies here. – stefaanv Nov 03 '17 at 09:12
  • It seems it does...! Do you have a way around it? – TVA van Hesteren Nov 03 '17 at 09:17
  • It depends what you actually are trying to achieve. If my understanding is correct, a lockfile-name based on the command and the process-id of the "script" might work. (https://linux.die.net/man/3/getpid) – stefaanv Nov 03 '17 at 09:22
  • The PID is unique and may be recycled right.. So if I start 2x the same process quickly behind each other they will both be running. I want to be sure that only one of the two is running and the other gets terminated right away because of the lock file – TVA van Hesteren Nov 03 '17 at 09:26
  • So my understanding was wrong, I thought that you have a "C++ script" that launches "process C++ script", so I assumed that the "C++ script" process id didn't change that much and the lock is only valid within the "C++ script". Of course, when the C++ script is restarted and uses the same id, then it will use the same lock files. – stefaanv Nov 03 '17 at 09:33
  • Right, however that means that I can't lock the file, since I can't be sure that the PID is the same? – TVA van Hesteren Nov 03 '17 at 09:35
  • Yes, you need to find the locking scheme that fits your application. I just gave a proposal that isn't guaranteed to work. – stefaanv Nov 03 '17 at 09:58
  • I think I already have a solution. I save all the processor start commands. Unlock the main and start all the processors and main exits right after. This leads to a very small gap of running while not locked. Seems to work great for now – TVA van Hesteren Nov 03 '17 at 10:00

2 Answers2

2

No, 11 is EAGAIN/EWOULDBLOCK which simply means that you cannot acquire the lock because the resource is already locked (see the documentation). You received that error (instead of blocking behaviour) due to LOCK_NB flag.

EDIT: After some discussion it seems that the problem is due to flock() locks being preserved when subprocessing. To avoid this issue I recommend not using flock() for the lifetime but instead touch-and-delete-at-exit strategy:

  1. If file.lock exists then exit
  2. Otherwise create file.lock and start processing
  3. Delete file.lock at exit.

Of course there's a race condition here. In order to make it safe you would need another file with flock:

  1. flock(common.flock)
  2. If file.lock exists then exit
  3. Otherwise create file.lock
  4. Unlock flock(common.flock)
  5. Start processing
  6. Delete file.lock at exit

But this only matters if you expect simultaneous calls to main. If you don't (you said that a cron starts the process every 10min, no race here) then stick to the first version.

Side note: here's a simple implementation of such (non-synchronized) file lock:

#include <string>
#include <fstream>
#include <stdexcept>
#include <cstdio>

// for sleep only
#include <chrono>
#include <thread>

class FileLock {
    public:
        FileLock(const std::string& path) : path { path } {
            if (std::ifstream{ path }) {
                // You may want to use a custom exception here
                throw std::runtime_error("Already locked.");
            }
            std::ofstream file{ path };
        };

        ~FileLock() {
            std::remove(path.c_str());
        };

    private:
        std::string path;
};

int main() {
    // This will throw std::runtime_error if test.xxx exists
    FileLock fl { "./test.xxx" };
    std::this_thread::sleep_for(std::chrono::seconds { 5 });
    // RAII: no need to delete anything here
    return 0;
};

Requires C++11. Note that this implementation is not race-condition-safe, i.e. you generally need to flock() the constructor but in this situation it probably be fine (i.e. when you don't start main in parallel).

freakish
  • 54,167
  • 9
  • 132
  • 169
  • So you suggest me to remove LOCK_NB? Since it should lock the file again because it is not running. Why is the zombie keeping the file handle to the lock file open? – TVA van Hesteren Nov 03 '17 at 08:35
  • 1
    @TVAvanHesteren No, I suggest that **everything** is fine. There are no zombies. There's another process keeping the lock (the first one that fired). Or are you saying that there is no other process running yet you can't lock it? – freakish Nov 03 '17 at 08:36
  • Right, there is indeed one process running which was started by the main. However, on the terminal the main exits which makes me think that the started process from within the main is a zombie? However, this started process keeps the file handle to the main lock file open? How can I overcome this issue? – TVA van Hesteren Nov 03 '17 at 08:38
  • 1
    @TVAvanHesteren I think I need more explanation. Process A locks a file and goes into some processing (the lock is kept for whole A's lifetime). Process B tries to lock a file, it fails (cause A has the lock) and it exits. That's how your code should (and it seems it does) work. What behaviour do you expect? – freakish Nov 03 '17 at 08:41
  • No, actually I have the main which may start processor A and B or just A or just B. E.g. when the main starts A it terminates right after but A keeps running. When i run main again, it checks or it can obtain a lock on main.lock which should be available since processor A obtains a lock on its own, being processorA.lock. So when the main starts again after 1 minute, it can't lock main.lock because processor A is still running... You understand what I mean? If i shut down Processor A, the lock on main.lock is working perfectly again... which is unexpected behavior for the situation I want – TVA van Hesteren Nov 03 '17 at 08:43
  • As mentioned by stefaanv above, I want to create a orphan process with C++ in ubuntu but I can't get it to work it seems – TVA van Hesteren Nov 03 '17 at 08:56
  • 1
    @TVAvanHesteren I think I understand now. File descriptors (and thus `flock` calls) are preserved across `execve` and `fork` (probably `system` syscall as well). Try manually unlocking `flock` before `system` syscall. – freakish Nov 03 '17 at 09:15
  • Right, however that is not the behavior I need. e.g. when the main is running for more than 1 minute another instance of the main is started by the cronjob and therefore the lock on main would become useless? – TVA van Hesteren Nov 03 '17 at 09:17
  • 1
    @TVAvanHesteren Fair enough, so another idea is not to use `flock` but touch-and-delete-at-exit (and check if exists) a file and only use `flock` for touch-if-doesnt-exists synchronization. You may want to store pid in that file (to know for what process to look for in case of fuckup). I think this is the usual strategy. Of course you need to wrap this lock in RAII. – freakish Nov 03 '17 at 09:21
  • Wauw, impressive. No idea how to achieve this though, do you have a sample code? – TVA van Hesteren Nov 03 '17 at 09:27
  • Great solution, however when the process is terminated by hand or because of any other reason and it won't close gracefully, it stays locked right? – TVA van Hesteren Nov 03 '17 at 09:58
  • I think I already have a solution. I save all the processor start commands. Unlock the main and start all the processors and main exits right after. This leads to a very small gap of running while not locked. Seems to work great for now – TVA van Hesteren Nov 03 '17 at 10:00
  • 1
    @TVAvanHesteren You can handle some abnormal teminations via signal handlers. However you won't ever handle `kill -9` case. But if your solution works for you then that's great. – freakish Nov 03 '17 at 10:12
  • Thanks for the solution though – TVA van Hesteren Nov 04 '17 at 12:05
0

I solved this issue, since the system and fork commands seem to pass on the flock, by saving the command to execute in a vector. Unlock the lock file right before looping the Commands vector and execute each command. This leaves the main with a very tiny gap of running while not locked, but for now it seems to work great. This also supports ungraceful terminations.

[...]
if (LockFile(ExecuteFileName, Extra) == -1) {
    cout << "Already running!" << endl; //MAIN IS ALREADY RUNNING
    //RETURNS ME Resource temporarily unavailable when processor is running from an earlier run
    exit(EXIT_SUCCESS);
    }
vector<string> Commands;
if (StartProcessor) { //PSEUDO
    int LockFileProcessor = LockFile("Processor");
    if (LockFileProcessor != -1) {
        string Command = "nohup setsid ./Processor"; //NOHUP CREATES ZOMBIE
        Command += IntToString(result->getInt("Klantnummer"));
        Command += " > /dev/null 2>&1 &"; //DISCARD OUTPUT
        Commands.push_back(Command);
        }
    }
//UNLOCK MAIN
if (UnlockFile(LockFileMain)) {
    for(auto const& Command: Commands) {
        system(Command.c_str());
        }
    }
TVA van Hesteren
  • 1,031
  • 3
  • 20
  • 47