3

I'm using perf as basic event counter. I'm working on a program which suffers from data cache store misses. Which has as ratio as high as %80.

I know how caches in principle work. It loads from memory on various miss cases, removes data from cache when it pleases. What I don't understand is: what is the difference between store and load misses. How does it differ from loading and storing. How can you store-miss ?

Lolo
  • 3,935
  • 5
  • 40
  • 50
ayan ahmedov
  • 391
  • 1
  • 5
  • 23

2 Answers2

6

A load-miss (as you know) is referring to when the processor needs to fetch data from main memory, but data does not exist in the cache. So whenever the processor wants some data from the main memory, it esquires the cache, and if the data is already loaded you get a load-hit and otherwise you get a load-miss.

A store-miss is related to when the processor wants to write back the newly calculated data to the main memory.When it wants to write-back the data to the main memory, it hasto make sure that the content of the cache and main memory are in sync with each other. It can happen with two different policies that you can find here: Writing Policies.

So no matter what policy you choose, you first need to check whether the data is already in the cache so you can store it to cache first (since it's faster), and if the data block you are looking for has been evicted from the cache, you get a store-miss related to that cache.

You can check the applet here, to get a better idea of what happens in different scenarios.

Saman Barghi
  • 354
  • 1
  • 8
  • Great! Thanks. I basically failed to understand that store means writing, load means reading from memory. – ayan ahmedov Aug 23 '13 at 18:26
  • I believe this is wrong, I'm not familiar with how perf defines these events but "store" and "load" are usually used to refer to the actual operations you perform in your code. – Leeor Sep 07 '13 at 10:09
2

I'm not fully familiar with how perf define these events, but given the common definition I believe load/store miss is just a way to break down the overall miss rate counting, so that you may tell which accesses miss more often. Note that loads are usually performed speculatively (at least in modern x86 cpus), while stores are performed much later along the pipeline, after the commit point, so even a piece of code with both loads and stores to the same region can have different miss rates.

In MESI-based cache protocols a load would hit the cache, or miss and fetch the line from the memory or next cache levels, either exclusively if it's not owned by anyone else, or in a shared state if it is. It would write the data to the caches along the way in the process. A store would fetch a line in the same manner, but use an RFO (read-for-ownership) request which grants it exclusive ownership and the right to modify the line. The line would still get cached, but once the new data is written to it locally (usually in your L1 cache), it would become modified. The hit/miss process would look the same though.

What Saman referred to in his answer is the breakdown between reads and writes. Loads and stores (and other forms of access like code-read) all form the "read" part, and writebacks (or intentional write-throughs using special command or mem types like uncacheable) form the "write part.

Leeor
  • 19,260
  • 5
  • 56
  • 87