How to get values in parallel using geode?

Question

We would like to perform a region.getAll(keys) operation such that the values are fectched in parallel. This behavior is desired primarily to force the cacheloader to load the values in parallel. A bulk read from the cacheloader would also work but it is not clear how we can convey to the cacheloader the other keys present in the getAll().

Is this something that is best handled on the client side or are there other geode APIs that can help?

score 1 · Answer 1 · answered Jul 31 '17 at 16:26

Region.getAll(keys) is a sequential operation, iterating over each key in the provided Collection individually and fetching the value from the Region. If you trace through the source code from Region.getAll(keys), you eventually will arrive here.

If your Region is a PARTITION Region (highly recommended), you could take advantage of Geode's parallel Function Execution, something like...

Region<?, ?> myPartitionRegion = ...
...

Set<KEY> keysOfInterests = ...
...

Execution functionExecution = FunctionService.onRegion(myPartitionRegion)
    .withFilter(keysOfInterests)
    .withArgs(...);

ResultCollector<?, ?> results = functionExecution.execute("myFunctionId");

// process the results.

Then your Function implementation...

class MyOnRegionFunction extends FunctionAdapter {

  public void execute(FunctionContext context) {

    assert context instanceOf RegionFunctionContext : 
      String.format("This Function [%s] must be executed on a Region", 
        getId());

    RegionFunctionContext regionContext = (RegionFunctionContext) context;

    Region<K, V> localData = 
      PartitionRegionHelper.getLocalDataForContext(regionContext);

    Map<K, V> results = localData.getAll(regionContext.getFilter());

    // do whatever with results; e.g. send back to caller...
  }
}

When you set a "Filter" on the Execution, which is a Set of Keys used to "route" the Function execution to the data nodes in the cluster containing those "keys", then in effect, you have (somewhat) parallelized the getAll operation (well, to the extent that only keys on that node are part of the Filter in that "context", i.e. this).

There is perhaps a better, more complete example of this here. See section, "Write the Function Code".

You should probably also read up on "How Function Execution Works" and on PARTITION Regions. Also pay attention to this...

An application needs to perform an operation on the data associated with a key. A registered server-side function can retrieve the data, operate on it, and put it back, with all processing performed locally to the server.

Which is the first bullet on this page.

You can even associate a CacheLoader to the "logical" PARTITION Region, and when the fetch is made inside the Function, and the data is not available, the loader will (should) operate locally to that node since it is only fetching KEYS that would go to that node anyway (based on the partition strategy (bucket "hash" by default)).

I have not tried the later, but I don't see why that would not work off the top of my head.

Anyway, hope this helps!

-John

How to get values in parallel using geode?

1 Answers1