0

I'm trying to implement a cache for data fetched from external data source. I'm trying to figure out if I can avoid locks all together and use timestamps to ensure that stale data is never inserted in cache. Is there a mechanism already developed for this? Let me give an example:

    // Reader thread does
   1 Data readData(id) {
   2       Data data = cache.get(id);
   3       if(data == null)
   4           data = extDataSrc.readData(id);
   5       cache.put(id, data);   
   6       return data;    }

    // Writer thread does
   7 void updateData(id, Data data) {
   8        extDataSrc.updateData(id, data);
   9        cache.remove(id); 
   10 }

So now without locks it is possible that when id is not present in cache, reader calls extDataSrc. If at the same time writer updates same id, it is possible that before writer commits, reader reads stale data and gets delayed in returning from extDataSrc call. Meanwhile writer executes cache.remove(id) (no data in cache so does not remove anything) and returns. Reader then executes cache.put(id). I was thinking that this could be avoided by using timestamps such that when reader checks the cache, it saves a timestamp TR1 (after line 2: time when cache was checked for id). Writer saves TW1 after executing remove (after line 9: update time). Reader after executing line 4, again saves TR2 (after line 4: when read is complete and cache update about to begin). Here if TR2 > TW1, it skips cache.put because other thread has done an update after it read the cache.

So, TR1 = 100, TW1 = 105, TR2 = 110 => skip cache.put.

Makes any sense?

Cœur
  • 37,241
  • 25
  • 195
  • 267

2 Answers2

1

Have a look at:

Claudio
  • 10,614
  • 4
  • 31
  • 71
  • I want to avoid possible read starvation during long a update so read-write lock or range-lock may not always help. RCU looks interesting. I'll take a look. Thanks! – user2960853 Nov 06 '13 at 15:11
0

I recommend to put a temporary syncronization object in the cache while extDataSrc.readData(id) is executed. First, if 2 reader threads request the same item, the second thread need not to issue redundant request, but simply waits the first issued request. Second, when writer sees that request is in progress, it can simply put its data into the cache and feed the readers. When the readData is finished, it must check if the request is already satisfied by a writer (the cache item is data, not the temporary object) and simply discard the (stale) data from extDataSrc.

And rather than using timestamps, I'd use version numbers in the data objects - it would work even there are several processes writing to the same extDataSrc.

Alexei Kaigorodov
  • 13,189
  • 1
  • 21
  • 38
  • In my case I cannot write to the cache directly because when an object is modified or added, the database generates some values for that object automatically.So I am compelled to clear the cache.You might say that I could write to the cache after the update by reading a fresh copy from database but I cannot do so.The existing architecture does not allow it. I also thought about versioning!You can almost get rid of locks by using that but I cannot do that either.Not allowed to change DB tables.I went ahead with locks.I was not given enough time by management to come up with something optimal!:( – user2960853 Nov 15 '13 at 04:51