We operate on the sketches (sizes can vary from 1GB to 15GB) and currently breaking them into parcels (each one is around 50mb) and storing the parcels in the Geode partitioned region, We read these data from S3 and put all of them in the region. once this is successful, we insert an entry (marker key) in the region. This marker key is very important in our business logic.
Below is the region configuration
<region name="region_abc">
<region-attributes data-policy="partition" statistics-enabled="true">
<key-constraint>java.lang.String</key-constraint>
<entry-time-to-live>
<expiration-attributes action="destroy" timeout="86400"/>
</entry-time-to-live>
<partition-attributes redundant-copies="0">
<partition-resolver name="SingleBucketPartitioner">
<class-name>com.companyname.geode.sketch.partition.SingleBucketPartitioner</class-name>
</partition-resolver>
</partition-attributes>
<cache-loader>
<class-name>com.companyname.geode.abc.cache.BitmapSketchParcelCacheLoader</class-name>
<parameter name="s3-region-name">
<string>us-east-1</string>
</parameter>
<parameter name="s3-bucket-name">
<string>xyz</string>
</parameter>
<parameter name="s3-folder-name">
<string>abc</string>
</parameter>
<parameter name="s3-read-timeout">
<string>600</string>
</parameter>
<parameter name="read-through-pool-size">
<string>70</string>
</parameter>
<parameter name="measurement-group">
<string>abcd</string>
</parameter>
</cache-loader>
<cache-listener>
<class-name>com.companyname.geode.abc.cache.ClearMarkerKeyAfterAnyEntryDestroyCacheListener</class-name>
</cache-listener>
<eviction-attributes>
<lru-heap-percentage action="local-destroy"/>
</eviction-attributes>
</region-attributes>
</region>
If the marker key is present in the region, then we assume that we have all the entries of the sketch.
We have currently set the cache eviction to trigger at 70% of heap usage, which evicts cache entries using the LRU algorithm. We have been seeing some inconsistencies in the data due to the cache eviction. We have spotted scenarios, where cache eviction evicted some or many of the entries, but not the marker key, that brings inconsistency to that object, as application thinks that we have all the entries but actually we don't.
To fix this, we also implemented a listener for the destroy, but somehow this is also not fixing the issue.
'''
@Override
public void afterDestroy(EntryEvent<String, BitmapSketch> event) {
String regionKey = event.getKey();
Region<String, BitmapSketch> region = event.getRegion();
// take action only when non-marker key is evicted and marker key is still present
if (regionKey != null && !regionKey.startsWith("[")) {
//asynchronus call
reloadExecutor.submit(
() -> {
String markerKey = "[".concat(regionKey.substring(0, regionKey.indexOf("_")).trim().concat("]"));
//check for marker key presence before removing the marker key
if (region.containsKey(markerKey)) {
logger.info("FixGeodeCacheInconsistency : Marker key exist !!! Deleting the marker key associated with the entry key. Region: `{}`; Entry Key: `{}`; Marker Key: `{}`",
region.getName(), regionKey, markerKey);
//remove the marker key from the region to bring consistency for the sketch
region.remove(markerKey);
logger.info("FixGeodeCacheInconsistency : Marker key destroyed. Region: `{}`; Entry Key: `{}`; Marker Key: `{}`",
region.getName(), regionKey, markerKey);
}
});
}
}
'''
We are now in the run to look for some other more reliable solution and trying to take a deeper look at the problem.
Couple of notes
- We are breaking one big object and storing the parts as entries in the region.
- We add one marker key to the region to determine the object existence
- We read all the parts of this object from the region to create the big object
- Geode region does not know the connection between these parts
A simple example of one such object. For example, the object is E12345
- Marker key:- [E12345]
- Parts/Entries:- E12345_00, E12345_01, E12345_02, E12345_03, E12345_04, E12345_05 and so on....
Geode Cache eviction sometimes evict some of the parts but not the marker key, which is causing all the issues.
We are trying to come up with the approach of achieving any of the below.
- Is there an option of grouping related entries together so that Geode knows these are relevant entries for one broader object
- How to make sure that Geode cache eviction is not causing inconsistencies. Currently, it is removing some of the entries and leaving some of them there, that brings inconsistency in the end results
- Is this a good use case for the region locking semantics?
I will be glad to provide more context and details as required.
Any details/guidance/suggestions are appreciated.