1

An excerpt from Wiki on Cache invalidation - "Cache invalidation is a process in a computer system whereby entries in a cache are replaced or removed." But, why on earth do we need to invalidate Cache? I can think of only possible scenario - If for some reason cache and the database go out of sync, the data in cache will be stale. To sync it, we will need to invalidate cache. But, the cache and DB going of sync(except for a short period of time when the data is yet to be written into both) is not a desirable behaviour. So, cache invalidation acts as a remedy if we discover that the cache does not contain the correct data. Is this its sole purpose?

f_puras
  • 2,521
  • 4
  • 33
  • 38
Praveen Nvs
  • 331
  • 3
  • 14
  • 1
    You imply, that "someone" refreshes the cache with the "new" data if the underlying data structure changes, but that is not true as a cache is only filled on a need-to-use-basis. I.e., if the underlying data changes, remove the stale data and wait if the cache would be needed in future. And if yes, re-fill it. – Smutje Apr 29 '19 at 08:08
  • I don't think you realize that `Cache and DB going out of sync` event is too frequent to have it claimed just as an *undesirable* behavior. It happens more often than just a few times, and your business logic **should be affected** by it. Otherwise why else are you even caching? –  Apr 29 '19 at 08:14
  • @Smutje - There are different cache updating strategies. I think updating one 'need-to-use-basis' comes under 'Write around cache' policy. In write back and write through policies, we update cache and DB. – Praveen Nvs Apr 29 '19 at 08:26
  • Cache and DB out of sync - too frequent ? I know that happens in 'write around cache' policy. Consider 'write through' or 'write back' cache updating policies. In that, they won't be out of sync for too long. – Praveen Nvs Apr 29 '19 at 08:30
  • Just for the record: please try to be as specific as possible. You see, this community is for specific questions on programming. It is not a place where we have lengthy discussions (by asking more and more questions using comments). From that point of view, you might want to look to the [help] to learn how and what to ask here. – GhostCat Apr 29 '19 at 09:21
  • First of all please follow GhostCat's Comment on how to ask question. Your query is becoming unclear. In order that you understand my comment, I present the following statements: in your original post, you are asking When cache validation is needed, and answered your own question by mentioning `when DB and cache is out of sync.` My point was you saying `.. DB / Cache out of sync is not a desirable behavior`, may be a result of not addressing such issues frequently. –  Apr 29 '19 at 09:45
  • @MohammadRakibAmin: Thanks.'answered your own question by mentioning.....' .....I know that is one reason for cache invalidation...but I intended to know whether there is any other reason. – Praveen Nvs Apr 29 '19 at 16:44
  • @GhostCat - have you deleted your answer? It was there in the afternoon and now, I cannot see it anymore. – Praveen Nvs Apr 29 '19 at 16:45
  • @GhostCat - I do not why anybody downvoted it. I do not have any reputation..It is my first question after all !! Your answer was good. Explains one of the scenarios where cache invalidation is performed. I will accept it. Post it again. – Praveen Nvs Apr 29 '19 at 18:36
  • I brought it back two hours ago, simply refresh the page?! – GhostCat Apr 29 '19 at 18:59

2 Answers2

1

Cache invalidation exists because most caches operate based upon a trade-off of performance vs capacity.

Consider a solid state drive vs a hard drive. The performance of the SSD will be better but the amount of data you can store will be worse at the same cost level. Often people will combine them to get the performance of an SSD for frequently accessed files (such as the operating system), and a HDD for raw storage capacity.

CPUs are structured in a similar hierarchy, where the closest to the CPU is the fastest but also the smallest. The costs in this case are not necessarily just monetary cost but also physical space, power usage, heat production etc.

  1. CPU registers - fastest, very small
  2. CPU caches (also have their own hierarchy) - fast, small
  3. RAM - medium, large

To keep the caches performing at their best, the most frequently accessed items must be maintained so that there is a better ratio of cache hits to misses. We want to be fetching from our slower sources as infrequently as possible. Similarly, because of the limited size constraint, we need to evict the items which are accessed least frequently.

Cache invalidation is the strategy which we will utilise in order to decide which items to evict and when, in order to make space for newer items which have a higher likelihood of being required again. It is not applicable if your cache contains a full representation of some other data source.

Michael
  • 41,989
  • 11
  • 82
  • 128
-1

There are plenty of reasons. Probably one of the the most common ones: a cache is (often by nature) much smaller compared to the overall amount of data that needs to be stored.

In other words: if you just keep adding and adding elements to your cache, it becomes a full copy of your data. Respectively, you run out of memory quickly.

In other words: the nature of a cache is this: it is limited (somehow) in size. Thus, sooner or later you are facing a decision like: "I can't just add a new element to the cache, I have to make room first". And then you have to do exactly that: invalidate one of the entries in your cache so that there is room for that "newer" entry.

And given the comment by the OP: often invalidating a whole cache is seen similar to "restart" your program, or "re-install your app", or "restart your device". It is often seen as "generic" mean to ensure the program/application gets reset to a known good state.

GhostCat
  • 137,827
  • 25
  • 176
  • 248
  • "a cache is (by nature) much smaller compared to the overall amount of data" This is not really correct. Consider a web cache. Each network node could theoretically have a copy of every cachable item - the performance win is that the number of network hops to fulfill the request is reduced. This kind of cache does not need to be small to be effective (didn't downvote btw) – Michael Apr 29 '19 at 08:15
  • Answer does not address the origin of cache invalidation: the multi-threaded (or multi-user) environment. – Mark Jeronimus Apr 29 '19 at 08:18
  • Got it. Good point. Any reasons to do it when the cache limit is not yet reached? I came across it only during fixing defects - there is a LIVE defect caused due to the customer receiving stale data. Then, our first reaction is -'invalidate the cache for that user account number.' We do this as a quick-fix before delving into why the data in the cache was out of sync. – Praveen Nvs Apr 29 '19 at 08:23
  • @MarkJeronimus And which part of the question focuses on multi-threaded/multi-user? The OP gives ONE example, I give a more generic view on top of that?! – GhostCat Apr 29 '19 at 08:30
  • @Michael Adapted the answer accordingly. – GhostCat Apr 29 '19 at 08:31
  • @PraveenNvs Updated my answer accordingly. – GhostCat Apr 29 '19 at 08:32
  • @GhostCat - .....Sometimes, cache is refreshed periodically. That means, it is set to the latest data at regular time intervals. 1) I think your last para about cache invalidation refers to this. Right? 2) We usually call this 'refreshing cache.' But, I think this is a kind of cache invalidation. Right? 3. Even the Wiki page seems to mention this as one of the cache invalidation methods. Calls it 'Refresh'. Instead of replacing with latest data, if we remove the contents of the cache(resulting in a cache miss), it is called 'Purge' - another method of invalidation. – Praveen Nvs Apr 29 '19 at 09:02