0

I would like to have a class of varying number n of objects which are easily iterated over as a group, with each object member having a large list (20+) of individually modified variables influencing class methods. Before I started learning OOP, I would just make a 2D array and load the variable values into each row, corresponding to each object, and then append/delete rows as needed. Is this still a good solution? Is there a better solution?

Again, in this case I am more interested in pushing processor performance rather than preserving abstraction and modularity, etc. In this respect, I am very confused about the way the data container ultimately is read into the L1 cache, and how to ensure that I do not induce page inefficiency or cache-misses. If for example, I have a 128 kb cache, I assume the entire container should fit into this cache to be efficient, correct?

Daggaz
  • 9
  • 1
  • 1
    I'm tempted to close this as "Needing detail and clarity". Can you provide a code sample of what you are trying to do and optimize on? – selbie Jan 07 '20 at 11:19
  • 1
    When you say that each object member has a large list of 20+ individually modified variables, it would be important to know if that number is exactly the same for all objects and whether the total size of these variables is fixed. If both the number of objects is variable and the size of an individual object, that would mean that the array/vector would have to be flexible in both dimensions. – Andreas Wenzel Jan 07 '20 at 12:26

1 Answers1

0

According to Agner Fog's optimization manual, the C++ Standard Template Library is rather inefficient, because it makes extensive use of dynamic memory allocation. However, a fixed size array that is made larger than necessary (e.g. because the needed size is not known at compile time) can also be bad for performance, because a larger size means that it won't fit into the cache as easily. In such situations, the STL's dynamic memory allocation could perform better.

Generally, it is best to store your data in contiguous memory. You can use a fixed size array or an std::vector for this. However, before using std::vector, you should call std::vector::reserve() for performance reasons, so that the memory does not have to be reallocated too often. If you reallocate too often, the heap could become fragmented, which is also bad for cache performance.

Ideally, the data that you are working on will fit entirely into the Level 1 data cache (which is about 32 KB on modern desktop processors). However, even if it doesn't fit, the Level 2 cache is much larger (about 512 KB) and the Level 3 Cache is several Megabytes. The higher-level caches are still significantly faster than reading from main memory.

It is best if your memory access patterns are predictable, so that the hardware prefetcher can do its work best. Sequential memory accesses are easiest for the hardware prefetcher to predict.

The CPU cache works best if you access the same data several times and if the data is small enough to be kept in the cache. However, even if the data is used only once, the CPU cache can still make the memory access faster, by making use of prefetching.

A cache miss will occur if

  1. the data is being accessed for the first time and the hardware prefetcher was not able to predict and prefetch the needed memory address in time, or
  2. the data is no longer cached, because the cache had to make room for other data, due to the data being too large to fit in the cache.

In addition to the hardware prefetcher attempting to predict needed memory addresses in advance (which is automatic), it is also possible for the programmer to explicity issue a software prefetch. However, from what I have read, it is hard to get significant performance gains from doing this, except under very special circumstances.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • Ok wow, very useful link thank you! So I should define my own containers, and more importantly I should be pre-allocating my heap, with the intent of filling it (preferably permanently if possible) with data structures constrained to the cache sizes (especially L1) and preferably with a permanent size. So in this case it is better to just use a bunch of zeros rather than deleting the row itself and changing the vector dimension, and here I assume that matrix functions are very fast on the CPU. And in the case of needing a larger container, I should spill over and use a new one from the heap? – Daggaz Jan 07 '20 at 15:36
  • @Daggaz: Pre-allocating your heap in the sense of creating a [memory pool](https://en.wikipedia.org/wiki/Memory_pool) or calling [std::vector::reserve()](https://en.cppreference.com/w/cpp/container/vector/reserve) would prevent frequent dynamic memory allocation, so that a fragmented heap (which is bad for cache performance) would be less likely. However, the problem with these is that they must be fixed size. In the case of std::vector::reserve, a size increase is possible, but this will possibly cause the buffer to have to be moved to a new (larger) memory buffer (which is expensive). – Andreas Wenzel Jan 08 '20 at 05:18
  • @Daggaz: Sometimes there is no alternative to using frequent dynamic memory allocation, especially if a reasonable maximum size of the data you are working on cannot be determined at compile-time. However, generally, even if you waste some memory (less than 50%), it is better to use fixed size buffers than frequent dynamic memory allocation. – Andreas Wenzel Jan 08 '20 at 06:37