0

Context: Many of the operations I'm doing require lengthy web accesses. Sometimes a web access fails and the process needs to be restarted. And it's a pain to restart the process from scratch.

So I've written a number of ad-hoc approaches to checkpointing: when you restart the process, it looks to see if checkpoint data is available and re-initializes state from that, otherwise it creates fresh state. In the course of operation, the process periodically writes checkpoint data somewhere (to a file or to the db). And when it's finished, it cleans up the checkpoint data.

I'd like a simple, DRY, general-purpose checkpointing mechanism. How would you write it? Or is there a module that already does this? (Though it's not an issue yet, extra stars awarded for thread-safe implementations!)

fearless_fool
  • 33,645
  • 23
  • 135
  • 217

2 Answers2

0

What you're essentially describing is a state machine. Web services are stateless, but at each point when you provide an update to something on the server side, the state is updated, acting as the "check point," and can therefore be persisted between transactions or "web accesses" as you call them.

If you've never done state machines before, it may be a bit of a learning curve, but you can check out this page providing a list of "check pointing" or "state" gems. AASM will probably work, and is under active development, but depending on how much functionality you need you might look at the list of alternatives down the right-hand side of the screen to see what fits you best.

One production use of aasm that I know if is to automatically save a person's progress through a multi-step process, allowing them to drop off, disconnect, or simply come back later to finish it up. The steps in the process must be completed in some kind of order, and there's a defined "done" state in most cases. AASM should be able to take care of these sorts of things for you.

jefflunt
  • 33,527
  • 7
  • 88
  • 126
  • Interesting idea. I've worked with state machines before -- unless I'm missing something, most state machines generally require you to define all the states ahead of time. In my case, the number of states cannot be known a priori, and often number in the tens of thousands. I'm not sure coercing a state machine to be a checkpointing mechanism is a good fit. – fearless_fool May 02 '12 at 06:31
  • 1
    Hm, so what is the difference between your need, and just having, say, a column in a DB that stores the current state or "checkpoint"? – jefflunt May 02 '12 at 12:59
  • The question isn't about where you store the state: it could be stored in the db, or in a file, or wherever. The question is about how to make a versatile framework for checkpointing. For example, wouldn't it be nice to have a `with_checkpointing(something) { .... }` that you could use for various things? – fearless_fool May 02 '12 at 17:37
0

After mulling it over, I deciding that I'm willing to make this specific to ActiveRecord. By exploiting ruby's ensure facility and the destroyed? and changed? methods in ActiveRecord, the design becomes simple:

define Checkpoint model with :name and :state

# file db/migrate/xyzzy_create_checkpoints.rb
class CreateCheckpoints < ActiveRecord::Migration
  def change
    create_table :checkpoints do |t|
      t.string :name
      t.string :state
    end
    add_index :checkpoints, :name, :unique => true
  end 
end

# file app/models/checkpoint.rb
class Checkpoint < ActiveRecord::Base
  serialize :state
end

define WithCheckpoint module

# file lib/with_checkpoint.rb
module WithCheckpoint

  def with_checkpoint(name, initial_state, &body)
    r = Checkpoint.where(:name => name)
    # fetch existing or create fresh checkpoint
    checkpoint = r.exists? ? r.first : r.new(:state => initial_state)
    begin
      yield(checkpoint)
    ensure
      # upon leaving the body, save the checkpoint iff needed
      checkpoint.save if (!(checkpoint.destroyed?) && checkpoint.changed?)
    end
  end
end

sample usage

Here's a somewhat contrived example that randomly blows up after some number of iterations. A more common case might be a lengthy network or file access that can fail at any point. Note: We store the state in an array only to show that 'state' needn't be a simple integer.

class TestCheck
  extend WithCheckpoint

  def self.do_it
    with_checkpoint(:fred, [0]) {|ckp|
      puts("intial state = #{ckp.state}")
      while (ckp.state[0] < 200) do
        raise RuntimeError if rand > 0.99
        ckp.state = [ckp.state[0]+1]
      end
      puts("completed normally, deleting checkpoint")
      ckp.delete
    }
  end

end

When you run TestCheck.do_it, it might randomly blow up after some number of iterations. But you can re-start it until it completes properly:

>> TestCheck.do_it
intial state = [0]
RuntimeError: RuntimeError
        from sketches/checkpoint.rb:40:in `block in do_it'
        from sketches/checkpoint.rb:22:in `with_checkpoint'
        ...
>> TestCheck.do_it
intial state = [122]
completed normally, deleting checkpoint
=> #<Checkpoint id: 3, name: "fred", state: [200]>
fearless_fool
  • 33,645
  • 23
  • 135
  • 217