So I've thought and I've thought, and looked at the source code for Mechanize and for VCR, and I've decided that I'm really just over-thinking the problem. The following works just fine for my needs. (I'm using DataMapper, but translating it into an ActiveRecord model would be straightforward):
class WebCache
include DataMapper::Resource
property :id, Serial
property :serialized_key, Text
property :serialized_value, Text
property :created_at, DateTime
property :updated_at, DateTime
def with_db_cache(akey)
serialized_key = YAML.dump(akey)
if (r = self.all(:serialized_key => serialized_key)).count != 0
# cache hit: return the de-serialized value
YAML.load(r.first.serialized_value)
else
# cache miss: evaluate the block, serialize and cache the result
yield(akey).tap {|avalue|
self.create(:serialized_key => serialized_key,
:serialized_value => YAML.dump(avalue))
}
end
end
end
Example usage:
def fetch(uri)
WebCache.with_db_cache(uri) {|uri|
# arrive here only on cache miss
Net::HTTP.get_response(URI(uri))
}
end
commentary
I previously believed that a proper web-caching scheme would observe and honor header fields like Cache-Control, If-Modified-Since, etc, as well as automatically handle timeouts and other web pathology. But an examination of actual web pages made it clear that truly static data was frequently marked with short cache times. So it makes more sense to let the caller decide how long something should be cached and when a failing query should be retried.
At that point, the code became very simple.
Moral: don't over-think your problems.