NSTask Race Condition With ReadabilityHandler Block

Question

Basic Setup

I use NSTask to run a process that optimizes images. This process writes output data to stdout. I use the readabilityHandler property of NSTask to capture that data. Here is the abbreviated setup:

NSTask *task = [[NSTask alloc] init];
[task setArguments:arguments];  // arguments defined above
                   
NSPipe *errorPipe = [NSPipe pipe];
[task setStandardError:errorPipe];
NSFileHandle *errorFileHandle = [errorPipe fileHandleForReading];
                   
NSPipe *outputPipe = [NSPipe pipe];
[task setStandardOutput:outputPipe];
NSFileHandle *outputFileHandle = [outputPipe fileHandleForReading];
                   
NSMutableData *outputData = [[NSMutableData alloc] init];
NSMutableData *errorOutputData = [[NSMutableData alloc] init];
                   
outputFileHandle.readabilityHandler = ^void(NSFileHandle *handle) { 
      NSLog(@"Appending data for %@", inputPath.lastPathComponent); 
      [outputData appendData:handle.availableData]; 
};

errorFileHandle.readabilityHandler = ^void(NSFileHandle *handle) { 
       [errorOutputData appendData:handle.availableData]; 
};

I then call NSTask like this:

[task setLaunchPath:_pathToJPEGOptim];
[task launch];
[task waitUntilExit];

(This is all done on a background dispatch queue). Next I examine the return values of NSTask:

if ([task terminationStatus] == 0)
{
    newSize = outputData.length;
                           
    if (newSize <= 0)
    {
        NSString *errorString = [[NSString alloc] initWithData:errorOutputData encoding:NSUTF8StringEncoding];
        NSLog(@"ERROR string: %@", errorString);
    }

    // Truncated for brevity...
}

The Problem

Approximately 98% of the time, this works perfectly. However, it appears that -waitUntilExit CAN fire before the readabilityHandler block is run. Here is a screenshot showing that the readability handler is running AFTER the task has exited:

So this is clearly a race condition between the dispatch queue running the readabilityHandler and the dispatch queue where I've fired off my NSTask. My question is: how the hell can I determine that the readabilityHandler is done? How do I beat this race condition if, when NSTask tells me it's done, it may not be done?

NOTE:

I am aware that NSTask has an optional completionHandler block. But the docs state that this block is not guaranteed to run before -waitUntilExit returns, which implies that it CAN begin running even SOONER than -waitUntilExit. This would make the race condition even more likely.

I looked into the terminationHandler block, hoping that NSTask would run that block on the same dispatch queue it runs the readability handler blocks. It does not. This means that using the terminationHandler block will not solve the race condition. — Bryan, Mar 09 '18 at 01:02

Bryan · Accepted Answer · 2022-03-13T00:39:57.623

Update: Modern macOS:

availableData no longer has the issues I describe below. I'm unsure precisely when they were resolved, but at least Monterey works correctly. The approach described below is for older releases of macOS.

Additionally, with the modern Swift concurrency system in place and the new paradigm of "threads can always make forward progress", using semaphores like below should be a last resort. If you can, use NSTask's completionHandler API. I have no FORMAL guarantee that the readability handlers will complete before the completionHandler is called, but they seem to in practice, at least on modern macOS. Your mileage may vary.

Old Advice:

Ok, after much trial-and-error, here's the correct way to handle it:

1. Do not Use -AvailableData

In your readability handler blocks, do not use the -availableData method. This has weird side effects, will sometimes not capture all available data, and will interfere with the system's attempt to call the handler with an empty NSData object to signal the closing of the pipe because -availableData blocks until data is actually available.

2. Use -readDataOfLength:

Instead, use -readDataOfLength:NSUIntegerMax in your readability handler blocks. With this approach, the handler correctly receives an empty NSData object that you can use to detect the closing of the pipe and signal a semaphore.

3. Beware macOS 10.12!

There is a bug that Apple fixed in 10.13 that is absolutely critical here: on old versions of macOS, the readability handlers are never called if there is no data to read. That is, they never get called with zero-length data to indicate that they’re finished. That results in a permanent hang using the semaphore approach because the semaphore is never incremented. To combat this, I test for macOS 10.12 or below and, if I’m running on an old OS, I use a single call to dispatch_semaphore_wait() that is paired with a single call to dispatch_semaphore_signal() in NSTask’s completionHandler block. I have that completion block sleep for 0.2 seconds to allow the handlers to execute. That’s obviously a godawfully ugly hack, but it works. If I’m on 10.13 plus, I have different readability handlers that signal the semaphore (once from the error handler and once from the normal output handler) and I still signal the semaphore from the completionHandler block. These are paired with 3 calls to dispatch_semaphore_wait() after I launch the task. In this case, no delay is required in the completion block because macOS correctly calls the readability handlers with zero-length data when the fileHandle is done.

Example:

(Note: assume stuff is defined as in my original question example. This code is shortened for readability.)

// Create the semaphore
dispatch_semaphore_t sema = dispatch_semaphore_create(0);

// Define a handler to collect output data from our NSTask
outputFileHandle.readabilityHandler = ^void(NSFileHandle *handle)
{
    // DO NOT use -availableData in these handlers.
    NSData *newData = [handle readDataOfLength:NSUIntegerMax];
    if (newData.length == 0) 
    {
        // end of data signal is an empty data object.
        outputFileHandle.readabilityHandler = nil;
        dispatch_semaphore_signal(sema);
    } 
    else 
    {
       [outputData appendData:newData];
    }
};

// Repeat the above for the 'errorFileHandle' readabilityHandler.


[task launch];
  
// two calls to wait because we are going to signal the semaphore once when
// our 'outputFileHandle' pipe closes and once when our 'errorFileHandle' pipe closes                     
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER);
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER);

// ... do stuff when the task is done AND the pipes have finished handling data.

// After doing stuff, release the semaphore
dispatch_release(sema);
sema = NULL;

Note: I didn't do it here, but calls to NSTask's -setLaunchPath and -launch methods should be wrapped in a @try/@catch/@finally block because NSTask will throw exceptions if you even LOOK at it the wrong way. — Bryan, Mar 15 '18 at 03:55
doesn't work at all for me on macOS 12. `-readDataOfLength:NSUIntegerMax` just blocks forever, while `availableData` works fine — user1259710, Mar 12 '22 at 13:36
Same. This post is from 4 years ago; the issue has been resolved in macOS updates and -availableData now works correctly. It's the way to go when targeting newer OS's. — Bryan, Mar 13 '22 at 00:33

score 1 · Answer 2 · answered Mar 09 '18 at 03:36

1

Create a semaphore with an initial value of 0. In the readability handlers, check if the data object returned from availableData has length 0. If it does, that means end of file. In that case, signal the semaphore.

Then, after waitUntilExit returns, wait on the semaphore twice (once for each pipe you're reading). When those waits return, you've got all of the data.

answered Mar 09 '18 at 03:36

Ken Thomases

88,520
7
116
154

Well, the readabilityHandler is never called when "availableData.length" is zero. So there's no way to know when to signal the semaphore from there. – Bryan Mar 09 '18 at 05:26
Update: it looks like -availableData is the problem. That method does some weird things. Instead, write this in the handler block: NSData *tempData = [handle readDataOfLength:NSUIntegerMax]; Then inspect its length. If > 0, add it to outputData. If == 0, set outputFileHandle.readabilityHandler = nil. When not using -availableData, it looks like the readability handler is called with 0-length data when nothing is left to read. – Bryan Mar 09 '18 at 06:53
You could use that approach to trigger a semaphore, but in my case it is possible that the subprocess task will produce NO output whatsoever, which I'm afraid would never trigger the readabilityHandler block and therefore never signal the semaphore. I need to test it some more because the docs are awful. – Bryan Mar 09 '18 at 06:55
There should be some event that occurs when the pipe's write end is closed, which will happen implicitly when the task exits if the task was the only thing holding it open. You may need to clear your references to the pipe objects to make sure the parent isn't holding it open. – Ken Thomases Mar 09 '18 at 14:56
Agreed. And the event for that using the old -readInBackgroundAndNotify method was a notification with an empty NSData object. There is no mention of a similar approach for the newer readabilityHandler blocks. However, from testing, it looks like if you avoid -availableData and instead use -readDataOfLength in the handler block, you will eventually get a call to that block with zero-length data. If you don't nil the handler at that point, you'll get called with repeated zero-length data objects. That may be the intention, it's just not documented. – Bryan Mar 10 '18 at 01:16
This was indeed the approach to use. The trouble I had was if you use -availableData in the handler AND try this approach, you won't reliably receive the "empty NSData" object that signals the pipe closing. – Bryan Mar 15 '18 at 03:56