0

I currently have a python command line application that uses python invoke package to organise, list and execute tasks. There are many task files (controlled & created by users, not me). Execution time for some task files can be more than an hour. Each task is actually a test script/program. invoke is useful in listing/executing all the tasks in a task file (we call it a testsuite) or only a bunch of them (a tasks collection) or a single task. (Having a ton of loose scripts and organising, listing & running them in the way users want would be quite a task, hence invoke). However, invoke cannot be used as a library. It does not offer an API that can be leveraged to list and run test tasks. So I am forced to run invoke as a shell command in subprocess from command line program. I replace (via execl()) the current process with invoke because once the control passes to invoke, there is no need to come back to parent process. So far good..

Now, there is a requirement that this command line program be callable from a web application. So I need to wrap this cmdline program in a restful http API. I've decided to use bottle.py to keep things simple.

I understand that the long running testsuite (tasks) will have to be done off the http request/response cycle. But I'm unable to finalise exactly how to go about it (prob. I may be overthiniking). But here is what I want ...

  • Tasks are written by users. They are always synchronous, they may sleep or execute shell commands via subprocess.run().
  • Application is internal, it will not be bombarded with huge number of requests. No of users Max. 10.
  • But each request (of type that runs the task) will take minutes and some cases > hour to complete. New requests during this should not block.
  • Calling application (running on a different host) will need to report progress of the running task to the browser UI. ('progress bar')
  • Ability to communicate with running task and 'cancel' it from browser UI.

With above situation, am I correct in saying ..

  • because a new 'process' must be spawnned (due use of subprocess and excl in current code) for a request, it rules out using 'threads' of any type (os threads, greenlets, gevent)?
  • Using any async libraries (web framework, web/http server or in app code) won't be of much help, because every run request will have to be a new process anyway?
  • How will the process be spawned when a request comes in? Let the web/htpp server (gunicorn?) do it? or My application has to take case of forking itself?
  • is 'gunicorn' a good choice for this situation?
  • I have a feeling that users may also ask for the ability to schedule tasks/tests. I might end up using some sort of task queue. I have read 'huey' and feel that it is light & simple for my needs. (No redis/Celery). But any task queue also means a separate consumer process to administer? More moving parts to the mix.
  • 'progress-bar' functionality means, subprocess has to keep updating its progress somewhere and calling application has to read from there. Does this necessitate 'task queue' anyway?

There is a lot of material on all of this and I have read quite some if it. But it still has left me unclear as to how exactly to go about implementing my requirements. Any direction/pointers would be appreciated. I'd also appreciate any advice on what 'not to use'.

yogmk
  • 161
  • 10
  • You're asking good questions, and I want to help you, but your task is much bigger than the scope of a single StackOverflow answer. You need to work directly with an experienced software engineer who has built asynchronous webapps before. Or, you'll figure it out yourself with trial and error. But there's no single answer we can give you here that will solve it. (Not even "direction/pointers," since it really depends on your detailed requirements. You've done a good job describing your problem, but still the finer details matter.) – ron rothman Dec 18 '20 at 16:20
  • 1
    @ronrothman - :) Sounds fair. Thanks for spending time to understand. I guess I will just start the journey, take one step at a time. I will come back with specific questions as I go along. – yogmk Dec 21 '20 at 02:55

1 Answers1

0

If you need something really simple then you could write a wrapper around task spooler (linux tool to run tasks) https://vicerveza.homeunix.net/~viric/soft/ts/ (especially https://vicerveza.homeunix.net/~viric/soft/ts/article_linux_com.html for more details)

Otherwise it's probably better to switch to uwsgi spooler, rq with redis or celery with rabbitmq (cause with redis it works to certain extent).

KaszpiR
  • 71
  • 1
  • 4