My team uses Geode as a makeshift analytics engine. We store a collection of massive raw data objects (200MB+ each) in Geode, but these objects are never directly returned to the client. Instead, we rely heavily on custom function execution to process these data sets inside Geode, and only return the analysis result set.
We have a new requirement to implement two tiers of data analytics precision. The high-precision analytics will require larger raw data sets and more CPU time. It is imperative that these high-precision analyses do not inhibit the low-precision analytics performance in any way. As such, I'm looking for a solution that keeps these data sets isolated to different servers.
I built a POC that keeps each data set in its own region (both are PARTITIONED). These regions are configured to belong to separate Member Groups, then each server is configured to join one of the two groups. I'm able to stand up this cluster locally without issue, and gfsh indicates that everything looks correct: describe member
shows each member hosting the expected regions.
My client code configures a ClientCache that points at the cluster's single locator. My function execution command generally looks like the following:
FunctionService
.onRegion(highPrecisionRegion)
.setArguments(inputObject)
.filter(keySet)
.execute(function);
When I only run the high-precision server, I'm able to execute the function against the high-precision region. When I only run the low-precision server, I'm able to execute the function against the low-precision region. However, when I run both servers and execute the functions one after the other, I invariably get an exception stating that one of the regions cannot be found. See the following Gist for a sample of my code and the exception. https://gist.github.com/dLoewy/c9f695d67f77ec18a7e60a25c4e62b01
TLDR key points:
- Using member groups, Region A is on Server 1 and Region B is on Server 2.
- These regions must be PARTITIONED in Production.
- I need to run a data-dependent function on one of these regions; The client code chooses which.
- As-is, my client code always fails to find one of the regions.
Can someone please help me get on track? Is there an entirely different cluster architecture I should be considering? Happy to provide more detail upon request.
Thanks so much for your time!
David
FYI, the following docs pages mention function execution on Member Groups, but give very little detail. The first link describes running data-independent functions on member groups, but doesn't say how, and doesn't say anything about running data-dependent functions on member groups. https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/how_function_execution_works.html https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/function_execution.html