I'm building a web application which provides as a core feature the ability for users to upload large images and have them processed. The processing takes roughly 3 minutes to complete, and I thought Heroku would be an ideal platform for being able to run these processing jobs on-demand, and in a highly scalable way. The processing task itself is fairly computationally expensive, and needs to run a the high-end PX dyno. I want to maximize parallelization, and minimize (effectively eliminate) the time a job spends waiting in a queue. In other words, I want to have N PX dynos for N jobs.
Thankfully, I can accomplish this pretty easily with Heroku's API (or optionally a service like Hirefire). Whenever a new processing request comes in, I can simply increment the worker count and the new worker will grab the job from the queue and start processing immediately.
However, while scaling up is painless, scaling down is where the trouble starts. The Heroku API is frustratingly limited. I can only set the number of running workers, not specifically kill idle ones. This means that if I have 20 workers each processing an image, and one completes its task, I cannot safely scale the worker count to 19, because Heroku will kill an arbitrary worker dyno, regardless of whether it's actually in the midst of a job! Leaving all workers running until all jobs complete is simply out of the question, because the cost would be astronomical. Imagine 100 workers created during a spike continue to idle indefinitely as a few new jobs trickle in throughout the day!
I've scoured the web, and the best "solution" that people suggest is to have your worker process gracefully handle termination. Well that's perfectly fine if your worker is just doing mass-emailing, but my workers are doing some very drawn-out analytics on images, and as I mentioned above, take about 3 minutes to complete.
In an ideal world, I could kill a specific worker dyno upon completion of its task. This would make scaling down just as easy as scaling up.
In fact, I've come close to that ideal world by switching from worker dynos to one-off dynos (which terminate upon process termination, i.e. you stop paying for the dyno after it's "root program" exits). However, Heroku sets a hard limit of 5 one-off dynos that can be run simultaneously. This I can understand, as I was certainly in a sense abusing one-off dynos...but it is quite frustrating nonetheless.
Is there any way I can better scale down my workers? I would prefer not to have to radically re-engineer my processing algorithm...splitting it up into a few chunks which run in 30-40 seconds as opposed to one 3 minute stretch (that way accidentally killing a running worker wouldn't be catastrophic). That approach would drastically complicate my processing code and introduce several new points of failure. However, if it's my only option, I'll have to do it.
Any ideas or thoughts are appreciated!