1

While using pp to parallelize a significantly complex machine learning problem I'm finding myself having to rely fairly extensively on third party libraries which are of varying quality. One in particular has a decent amount of edge case crashes when used intensively on varying datasets. I will eventually have to solve these, but in the short term it is too much to try to fix both my bugs and theirs - and this library is really the best one.

My question is: Is there an established pattern to be used to allow for graceful failure of local worker processes in pp?

The options as I see them are:

  1. Don't use ANY local worker processes, use only REMOTE workers - and then rely on the socket timeout.
  2. Shell all work out to a secondary python script which I wrap and execute as a separate process, then just use the exit code to check for crashes. This would probably have to be combined with a timeout as well to guard for non-segfault failure cases.

Am I missing something here? I've been looking at pp.py and as far as I can tell there is no exit detection on the worker processes.

amirpc
  • 1,638
  • 3
  • 19
  • 24

0 Answers0