7

In libuv, you can end up tying up the worker threads with too much work or buggy code. Is there a simple function that can check the health of the worker threads or thread queue? It doesn't have to be 100% deterministic, after all it would be impossible to determine whether the worker thread is hanging on slow code or an infinite loop.

So any of the following heuristics would be good:

  • Number of queued items not yet worked on. If this is too large, it could mean the worker threads are busy or hung.

  • Does libuv have any thread killing mechanism where if the worker thread doesn't check back in n seconds, it gets terminated?

Glen Low
  • 4,379
  • 1
  • 30
  • 36

2 Answers2

1

That function does not exist in libuv itself, and I am not aware of any OSS that provides something like that.

In terms of a killing mechanism, there is none baked into libuv, but http://nikhilm.github.io/uvbook/threads.html#core-thread-operations suggests:

A well designed program would have a way to terminate long running workers that have already started executing. Such a worker could periodically check for a variable that only the main process sets to signal termination.

Jonathan Wiepert
  • 1,222
  • 12
  • 12
-1

If this is for nodejs, would a simple monitor thread do? I don't know of a way to get information about the event queue internals, but you can inject a tracer into the event queue to monitor that threads are being run in a timely manner. (This measures load not by the number of threads not yet run, but by whether the threads are getting run on time. Same thing, kind of.)

A monitor thread could re-queue itself and check that it gets called at least every 10 milliseconds (or whatever max cumulative blocking ms is allowed). Since nodej runs threads round-robin, if the monitor thread was run on time, it tells us that all other threads got a chance to run within that same 10 ms window. Something like (in node):

// like Date.now(), but with higher precision
// the extra precision is needed to be able to track small delays
function dateNow() {
    var t = process.hrtime();
    return (t[0] + t[1] * 1e-9) * 1000;
}

var _lastTimestamp = dateNow();   // when healthMonitor ran last, in ms
var _maxAllowedDelay = 10.0;      // max ms delay we allow for our task to run
function healthMonitor() {
    var now = dateNow();
    var delay = now - _lastTimestamp;
    if (delaly > _maxAllowedDelay) {
        console.log("healthMonitor was late:", delay, " > ", _maxAllowedDelay);
    }
    _lastTimestamp = now;
    setTimeout(healthMonitor, 1);
}

// launch the health monitor and run it forever
// note: the node process will never exit, it will have to be killed
healthMonitor();

Throttling the alert messages and supporting a clean shutdown is an exercise left to the reader.

Andras
  • 2,995
  • 11
  • 17