I have an application that involves a collection of arrays which can be very large (indices up to the maximum value of an int
), but which are lazy - their contents are calculated on the fly and are not actually known until requested. The arrays are also immutable - the value of each element of each array is constant throughout the life of the program. The arrays are sparse in the sense that often only a small subset of all array elements are ever requested (the arrays do not contain large blocks of zeros and are not "sparse" in that sense.)
Looking up (and possibly calculating in the process) an array element can be expensive, so I want to add a caching layer. The cache should implement the following interface:
void point_cache_store (gpointer data, gsize idx, gdouble value);
gdouble point_cache_fetch (gpointer data, gsize idx);
where data
serves as a unique handle for each array (there can be many of these). point_cache_fetch()
should return the value
argument passed to point_cache_store()
with the same data
and idx
arguments, or indicate a cache miss by returning the special value DATUM_UNKNOWN_VALUE
(the caller will never call point_cache_store
with DATUM_UNKNOWN_VALUE
).
The question is: how can I implement point_cache_fetch()
and point_cache_store()
? (They are currently no-op stubs.)
Points to consider:
- The cache implementation must be thread-safe. Several threads are running simultaneously and any of these can call
point_cache_store()
orpoint_cache_fetch()
with anydata
oridx
arguments. - The cache truly is a cache; it's always OK for
point_cache_fetch()
to returnDATUM_UNKNOWN_VALUE
, even if it once knew that value. The caller will just perform an ordinary lookup in that case. - Remember, the arrays are immutable - for given
data
andidx
arguments, the caller will always provide the samevalue
argument.
I realize that there are many ways to do this and that there are tradeoffs involved. For this question, though, I am going to evaluate answers by one very specific criterion: whether they improve performance in one particular benchmark in the application that inspired the question. If you want to go the extra mile and run the benchmark yourself, here is how to do it:
git clone git://github.com/gbenison/starparse
git clone git://github.com/gbenison/burrow-owl.git -b point-cache-base
The functions point_cache_fetch()
and point_cache_store()
are found in "burrow/spectrum/point_cache.c". The relevant benchmark is "benchmarks/b_cache".