1

I have a farm of servers, each server is regularly making an identical query to the database. These query results are cached in a shared cache, accessible to all servers.

How can I ensure that a newer query does not get overwritten by an older query? Is there a way of versioning the queries somehow, by time for example, so that this doesn't happen? How to deal with concurrent queries?

Thanks!

Edit: db is SQL Server. Query is a select statement. And, caching mechanism is very simple: simple write, with no locking. And that is because there is currently no way of telling the order of the select queries.

Jacko
  • 12,665
  • 18
  • 75
  • 126
  • You will need to give more info, like: RDBMS - SQL Server, MySQL, Oracle? What is the query (update, select, insert, delete?) What do you mean by "overwritten"? – JNK Dec 17 '10 at 17:14
  • The answer to this question depends *entirely* on the mechanism you're using to do this caching - which you neglected to mention. At the conceptual level, then only possible answer is "sure, just associate a timestamp with each query and only update the cache if the incoming query is newer than the existing." – Andrzej Doyle Dec 17 '10 at 17:15
  • The title is misleading. You are not versioning the query (which would mean tracking changes to the underlying SQL) but rather the result set. That is going to mean versioning whatever cache storage you're using — a file, or whatever. – Larry Lustig Dec 17 '10 at 18:40
  • Thanks, Larry. Yes, that's what I want. But how to do it? – Jacko Dec 17 '10 at 18:53

2 Answers2

0

One approach is to have a global update counter in the database, either for updates or for reads (updates is more efficient, but also harder to get right).

So on each update, also increment the global counter. On each read, read the global counter, and put the data into the cache along with the counter value. Only overwrite the cache contents if the counter value is larger.

Because of database isolation, transactions should appear as if they happened in a serial manner (assuming you have chosen the SERIALIZABLE isolation level). That, in turn, will mean that strictly higher counter numbers relate to more recent data.

Martin v. Löwis
  • 124,830
  • 17
  • 198
  • 235
  • Thanks, Martin. That's a great idea. I am using SQL server 2005; any ideas on the best way of implementing this? – Jacko Dec 19 '10 at 13:05
  • Not sure whether SQL server has a builtin feature for sequences; if not, just create a table `counter(value int)`, then update using `update counter set value=value+1;` – Martin v. Löwis Dec 19 '10 at 19:02
0

Where is the cache? Is it a disk file? Or a table in the database?

And what does it mean when you say an older query might overwrite a newer one? Isn't it true that the most-recently-completed query (or maybe the most-recently-started) is the newest one?

You should lock the cache container before performing each query, whether that's a file or a table. Each server should only perform the query if it can obtain the lock, otherwise it should wait for the locked resource. That way the cache will contain only the most recent results.

Is that along the lines of what you are asking?

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
  • Thanks, Larry. The cache is stored in an in-memory key,value server, similar to memcached. Yes, locking the container would serialize the queries, but I am concerned about performance in this case. So, I am trying to avoid locking. Now, each row in the result does have a version, and I can keep track of the latest version of each row, in the cache, so I may be able to compare queries based on this version. – Jacko Dec 19 '10 at 13:03