Scale down specific Heroku worker dynos?

Question

I'm building a web application which provides as a core feature the ability for users to upload large images and have them processed. The processing takes roughly 3 minutes to complete, and I thought Heroku would be an ideal platform for being able to run these processing jobs on-demand, and in a highly scalable way. The processing task itself is fairly computationally expensive, and needs to run a the high-end PX dyno. I want to maximize parallelization, and minimize (effectively eliminate) the time a job spends waiting in a queue. In other words, I want to have N PX dynos for N jobs.

Thankfully, I can accomplish this pretty easily with Heroku's API (or optionally a service like Hirefire). Whenever a new processing request comes in, I can simply increment the worker count and the new worker will grab the job from the queue and start processing immediately.

However, while scaling up is painless, scaling down is where the trouble starts. The Heroku API is frustratingly limited. I can only set the number of running workers, not specifically kill idle ones. This means that if I have 20 workers each processing an image, and one completes its task, I cannot safely scale the worker count to 19, because Heroku will kill an arbitrary worker dyno, regardless of whether it's actually in the midst of a job! Leaving all workers running until all jobs complete is simply out of the question, because the cost would be astronomical. Imagine 100 workers created during a spike continue to idle indefinitely as a few new jobs trickle in throughout the day!

I've scoured the web, and the best "solution" that people suggest is to have your worker process gracefully handle termination. Well that's perfectly fine if your worker is just doing mass-emailing, but my workers are doing some very drawn-out analytics on images, and as I mentioned above, take about 3 minutes to complete.

In an ideal world, I could kill a specific worker dyno upon completion of its task. This would make scaling down just as easy as scaling up.

In fact, I've come close to that ideal world by switching from worker dynos to one-off dynos (which terminate upon process termination, i.e. you stop paying for the dyno after it's "root program" exits). However, Heroku sets a hard limit of 5 one-off dynos that can be run simultaneously. This I can understand, as I was certainly in a sense abusing one-off dynos...but it is quite frustrating nonetheless.

Is there any way I can better scale down my workers? I would prefer not to have to radically re-engineer my processing algorithm...splitting it up into a few chunks which run in 30-40 seconds as opposed to one 3 minute stretch (that way accidentally killing a running worker wouldn't be catastrophic). That approach would drastically complicate my processing code and introduce several new points of failure. However, if it's my only option, I'll have to do it.

Any ideas or thoughts are appreciated!

afaik you can restart specific dynos, eg https://discussion.heroku.com/t/stop-a-specific-dyno/424. Changing the formation right afterwards might do the trick? You would have to know the name of a worker that has finished running — unohoo, Aug 16 '14 at 08:22
@unohoo that sounds like it might be promising! I'll give it a shot and report back. — Coleman S, Aug 19 '14 at 02:08
@ColemanS did you ever figure any thing out here? Trying to achieve basically the same thing. — Conrad Vanlandingham, Dec 15 '14 at 06:45

score 3 · Answer 1 · answered Jun 01 '15 at 10:20

3

This is what Heroku's support answered about this:

I'm afraid this isn't possible at the moment. When scaling down your workers, we will stop the one with the highest number, so we don't have to change the public name for those dynos, and you don't get numbering holes.

I found this comment interesting in this context, although it did not really solve this issue.

answered Jun 01 '15 at 10:20

André

2,042
1
23
26

Scaling down web workers on the Common Runtime (not Private Spaces) with Preboot enabled will terminate all dynos. I observed scaling down from 20 dynos to 19 that all 20 dynos were shut down, while Heroku spun up 19 new dynos. – qff Feb 16 '23 at 10:38

fearless_fool · Answer 2 · 2019-05-22T00:20:30.300

Schedule a cleanup task

Summary: Queue a task to run at the lowest priority. Once all other tasks have completed, the cleanup task will run.

Details

[NOTE: once I wrote this answer, I realize that it doesn't address the need to spin down a specific worker dyno. But you should be able exploit the key technique shown here: queue a low(er) priority DJ task to clean up when everything else has been processed.]

I've had good luck using Heroku's [platform-api][1] gem to spin up Delayed Job workers on demand and spin them down when they finish. To simplify things, I created a heroku_control.rb file as follows.

My app only needed one worker; I recognize that your requirements are significantly more involved, but any app can exploit this one trick: queue a low-priority task to shut down the worker dyno(s) after all other delayed job tasks have been processed.

require 'platform-api'

# Simple class to interact with Heroku's platform API, allowing
# you to start and stop worker dynos under program control.
class HerokuControl

  API_TOKEN = "<redacted>"
  APP_NAME = "<redacted>"

  def self.heroku
    @heroku ||= PlatformAPI.connect_oauth(API_TOKEN)
  end

  # Spin up one worker dyno
  def self.worker_up(act = Rails.env.production?)
    self.worker_set_quantity(1) if act
  end

  # Spin down all worker dynos
  def self.worker_down(act = Rails.env.production?)
    self.worker_set_quantity(0) if act
  end

  def self.worker_set_quantity(quantity)
    heroku.formation.update(APP_NAME, 'worker', {"quantity" => quantity.to_s})
  end

end

And in my app, I do something like this:

LOWEST_PRIORITY = 100

def start_long_process
  queue_lengthy_process
  queue_cleanup_task        # clean up when everything else is processed
  HerokuControl::worker_up  # assure there is a worker dyno running
end

def queue_lengthy_process
  # do long job here...
end
handle_asynchronously :queue_lengthy_process, :priority => 1

# This gets processed when Delayed::Job has nothing else
# left in its queue.
def queue_cleanup_task
  HerokuControl::worker_down # shut down all worker dynos
end
handle_asynchronously :queue_cleanup_task, :priority => LOWEST_PRIORITY

Hope this helps.

This is exactly what I needed when I found this post, thanks! — Brandon, May 20 '19 at 22:44

score 0 · Answer 3 · answered Aug 18 '14 at 21:54

0

I know you mentioned graceful termination, but I assume you meant graceful termination as in when a worker is killed off by using the API to set the number of workers. Why not just add as a part of the worker logic to kill itself when its job has been completed?

answered Aug 18 '14 at 21:54

Valevalorin

390
1
3
18

Sure, you can have a worker kill itself, and that works. However, Heroku will immediately restart it. – Coleman S Aug 19 '14 at 02:08

score 0 · Answer 4 · answered Sep 01 '17 at 20:17

It is now possible to shut down a specific dyno using the heroku ps:stop command.

e.g. if your heroku ps output contains:

web.1: up 2017/09/01 13:03:50 -0700 (~ 11m ago)
web.2: up 2017/09/01 13:03:48 -0700 (~ 11m ago)
web.3: up 2017/09/01 13:04:15 -0700 (~ 11m ago)

you can run heroku ps:stop web.2 to kill the second dyno in the list.

This won't do exactly what you want, because Heroku will immediately start up a new dyno to replace the one that was shut down. But perhaps that is still useful to you (or other people reading this question).

Scale down specific Heroku worker dynos?

4 Answers4

Schedule a cleanup task

Details