I currently have a python command line application that uses python invoke
package to organise, list and execute tasks. There are many task files (controlled & created by users, not me). Execution time for some task files can be more than an hour. Each task is actually a test script/program. invoke
is useful in listing/executing all the tasks in a task file (we call it a testsuite) or only a bunch of them (a tasks collection) or a single task. (Having a ton of loose scripts and organising, listing & running them in the way users want would be quite a task, hence invoke
).
However, invoke
cannot be used as a library. It does not offer an API that can be leveraged to list and run test tasks. So I am forced to run invoke
as a shell command in subprocess from command line program. I replace (via execl()
) the current process with invoke
because once the control passes to invoke
, there is no need to come back to parent process. So far good..
Now, there is a requirement that this command line program be callable from a web application. So I need to wrap this cmdline program in a restful http API. I've decided to use bottle.py
to keep things simple.
I understand that the long running testsuite (tasks) will have to be done off the http request/response cycle. But I'm unable to finalise exactly how to go about it (prob. I may be overthiniking). But here is what I want ...
- Tasks are written by users. They are always synchronous, they may
sleep
or execute shell commands viasubprocess.run()
. - Application is internal, it will not be bombarded with huge number of requests. No of users Max. 10.
- But each request (of type that runs the task) will take minutes and some cases > hour to complete. New requests during this should not block.
- Calling application (running on a different host) will need to report progress of the running task to the browser UI. ('progress bar')
- Ability to communicate with running task and 'cancel' it from browser UI.
With above situation, am I correct in saying ..
- because a new 'process' must be spawnned (due use of
subprocess
andexcl
in current code) for a request, it rules out using 'threads' of any type (os threads, greenlets, gevent)? - Using any async libraries (web framework, web/http server or in app code) won't be of much help, because every run request will have to be a new process anyway?
- How will the process be spawned when a request comes in? Let the web/htpp server (gunicorn?) do it? or My application has to take case of forking itself?
- is 'gunicorn' a good choice for this situation?
- I have a feeling that users may also ask for the ability to schedule tasks/tests. I might end up using some sort of task queue. I have read 'huey' and feel that it is light & simple for my needs. (No redis/Celery). But any task queue also means a separate consumer process to administer? More moving parts to the mix.
- 'progress-bar' functionality means, subprocess has to keep updating its progress somewhere and calling application has to read from there. Does this necessitate 'task queue' anyway?
There is a lot of material on all of this and I have read quite some if it. But it still has left me unclear as to how exactly to go about implementing my requirements. Any direction/pointers would be appreciated. I'd also appreciate any advice on what 'not to use'.