0

I develop a library which has asyncronous APIs. One of the APIs, when it is called, it creates a task, pushs it to task thread and returns its task ID. After the task is completed, task thread notify the result to caller by invoking a callback function.

The sequence is as follows

Caller

C.1. A Caller Calls API

C.2. Lib. create task and push it to the queue of the Task Thread

C.3. Lib. awakes the Task Thread by calling notify_all of a condition_variable

C.4. At this point, context switch can occur and this thread will be suspended

C.6. After this thread is resumed, Lib. returns task ID

Task Thread

T.1. The Task Thread executes the task.

T.2. When the task is completed, The Task Thread notifies the result to caller by invoking the callback

Problem

A caller check the result data of the callback function by task ID, but ocasionally the callback is invoked before the API returns and caller cannot check the result.

Question

I want to guarantee perfectly that the API returns task ID before its callback is invoked. What can I do?

I use lock_guard in the API's body to prevent that the callback is invoked, and it diminishes the possibility to reproduce this problem significantly.

But because lock_guard unlocks the mutext before API returns, if context-switch occurs after mutex unlocks, before API returns, this problem can be reproduced very rarely.

I want to prevent this case too.

summarized codes

long AcbCore::setState(long appState, long playState)   // API
{
    StateTask* task = (StateTask*) createTask(TaskType::PLAYER_STATE_CHANGE, &isAppSwitchingStateFlag, appState);

    std::lock_guard<std::mutex> lockGd (*task->getEventMutex());
    pushTask(task);      // at this position, context switch can occur by condTaskThread_.notify_all() of resumeLoop()

    return taskId;
}

void AcbCore::pushTask(Task* task)
{
    mtxTaskQueue_.lock();
    queueTask_.push_back(task);
    mtxTaskQueue_.unlock();

    resumeLoop();
}

void AcbCore::resumeLoop()
{
    mtxTaskThread_.lock();
    mtxTaskThread_.unlock();
    condTaskThread_.notify_all();
}

bool AcbCore::suspendLoop()
{
    bool isTimeout = false;
    if (ingTask_ != NULL) {
        isTimeout = (condTaskThread_.wait_for(lockTaskThread_, std::chrono::seconds(AWAKE_TASK_THREAD_TIMEOUT)) == std::cv_status::timeout);
    } else {
        condTaskThread_.wait(lockTaskThread_);
    }

    return isTimeout;
}

void AcbCore::taskLoop()  // loop of Task Thread
{
    Task* task = NULL;
    Action* action = NULL;
    while (isInitialized_) {
        while (popTask(task)) {
            if (task->isCompleted()) {
                fireEvent(task);
            } else {
                doNextTask(task);
            }
        }
        if (suspendLoop()) {    //  if awaked by timeout
            cancelTask(ingTask_, true);
        }
    }
}

void AcbCore::fireEvent(Task* task, bool bDelete)
{
    std::string errorInfo = task->getErrorInfo();

    task->waitToUnlockEvent();
    // eventHandler_ : callback set by caller when Acb is initialized
    eventHandler_(task->getTaskId(), task->getEventType(), appState_.now_, playState_.now_, errorInfo.c_str());

    if (bDelete) {
        deleteTask(task);
    }
}
Community
  • 1
  • 1
sskim
  • 3
  • 2
  • 1
    Instead of completion being a callback , make it set a signalling object. Then the caller can check the signalling object. (You have to have some form of synchronization objects either way, because the callback will be in a different thread to the caller) – M.M Jul 15 '14 at 05:35
  • Thank you for your kind reply but I can not edit the codes of caller so it should be managed in the library. – sskim Jul 15 '14 at 05:44
  • I use condition variables to control both of threads, but when task thread is resumed by notify_all, caller's thread can be suspended by context-switch and the callback is invoked before API returns – sskim Jul 15 '14 at 05:49
  • You can't edit the caller at all? Because it would be easy to fix if you made one more API call to generate a task ID synchronously first, then had the caller pass it in with the request. – John Zwinck Jul 15 '14 at 06:04
  • "notifies the result to caller". This description is a bit too generic. What specifically does it do? How does the caller use the notification? – n. m. could be an AI Jul 15 '14 at 06:10
  • OK, so the caller sets up the library task to run, and then stores the task ID for the callback. The caller performs no other synchronization actions during this time. How can the library task, which may already have started running, know that the caller has not yet stored the thread ID? You could introduce a delay in the library task, but that will only reduce the likelihood of error, not eliminate it. Why can't you fix the defect in the caller (or help someone else fix it)? – David K Jul 15 '14 at 06:18
  • Thank you all. Is there no way to fix this problem only in the library without modifying I/F of APIs or adding new APIs? – sskim Jul 15 '14 at 06:28
  • 2
    It is hard to understand what's going on from verbal descriptions. Please post (pseudo)code, detailing every access to data shared between the caller and the task thread, and every relevant synchronization call. – n. m. could be an AI Jul 15 '14 at 06:28
  • I added summarized codes. ^^ – sskim Jul 15 '14 at 06:48
  • **The caller is broken.** *The caller has to be fixed.* There's nothing the library can do without the caller's cooperation. There's no way the library can know when the caller considers it safe to call the callback unless the caller tells it somehow, which it currently doesn't do. – David Schwartz Jul 15 '14 at 06:52
  • Thank you for your kind comment. I gave up finding the solution. I will tell the developer of the caller to use the library about this issue. – sskim Jul 15 '14 at 07:09
  • The callback seems to run in the context of the task thread, which is different from that of the caller. But the caller is the ultimate user of the callback. Presumably the callback will touch some data that belongs to the caller. This requires synchronization around that data. – n. m. could be an AI Jul 15 '14 at 08:19
  • On an unrelated note, `mtxTaskThread_.lock(); mtxTaskThread_.unlock();` in `resumeLoop` smells extremely fishy. Can this sequence have *any* effect? – n. m. could be an AI Jul 15 '14 at 08:21
  • mtxTaskThread_.lock(); mtxTaskTread_.unlock(); is for checking whether other thread locks this mutex. If other thread locks the mutex while doing something, resumeLoop will wait until the thread has finished it. – sskim Jul 17 '14 at 00:14
  • In more detail, it is to prevent calling `notify_all();` before `wait();` is called. in `popTask()`, `mtxTaskThread_` is locked and there are a rather complex logics. – sskim Jul 17 '14 at 00:37

1 Answers1

2

Fundamentally, you can't solve this on your side. Let's assume that the initial thread is suspended exactly after the last instruction of your code, but before the caller gets the task ID. You simply cannot determine when the caller has stored the task ID in a way that allows the callback to happen.

Instead, the client should cache unknown task IDs when he's got an outstanding call to your async function.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Thank you. It's too sad to me. But your answer take me out of a distress of this problem. – sskim Jul 15 '14 at 07:05