1

I want to use Heroku but the fact they restart dynos every 24 hours at random times is making things a bit difficult.

I have a series of jobs dealing with payment processing that are very important, and I want them backed by the database so they're 100% reliable. For this reason, I chose DJ which is slow.

Because I chose DJ, it means that I also can't just push 5,000,000 events to the database at once (1 per each email send).

Because of THAT, I have longer running jobs (send 200,000 text messages over a few hours).

With these longer running jobs, it's more challenging to get them working if they're cut off right in the middle.

It appears heroku sends SIGTERM and then expects the process to shut down within 30 seconds. This is not going to happen for my longer jobs.

Now I'm not sure how to handle them... the only way I can think is to update the database immediately after sending texts for instance (for example, a sms_sent_at column), but that just means I'm destroying database performance instead of sending a single update query for every batch.

This would be a lot better if I could schedule restarts, at least then I could do it at night when I'm 99% likely not going to be running any jobs that don't take longer than 30 seconds to shut down.

Or.. another way, can I 'listen' for SIGTERM within a long running DJ and at least abort the loop early so it can resume later?

Tallboy
  • 12,847
  • 13
  • 82
  • 173

2 Answers2

1

Manual restarts will reset the 24 hr clock - heroku ps:restart at your preferred time ought to give you the control you are looking for.

More info can be found here: Dynos and the Dyno Manager

gordon
  • 91
  • 3
0

Here's the proper answer, you listen for SIGTERM (I'm using DJ here) and then gracefully rescue. It's important that the jobs are idempotent.

Long running delayed_job jobs stay locked after a restart on Heroku

class WithdrawPaymentsJob

  def perform
    begin
      term_now = false
      old_term_handler = trap('TERM') { term_now = true; old_term_handler.call }

      loop do

        puts 'doing long running job'
        sleep 1

        if term_now
          raise 'Gracefully terminating job early...'
        end
      end

    ensure
      trap('TERM', old_term_handler)
    end
  end

end

Here's how you solve it with Que:

    if Que.worker_count.zero?
      raise 'Gracefully terminating job early...'
    end
Tallboy
  • 12,847
  • 13
  • 82
  • 173