Vikas-
There are several ways in which you can look up, or fetch multiple elements from a GemFire Region.
- As you can see, a GemFire
Region
indirectly implements java.util.Map
, and so provides all the basic Map
operations, such as get(key):value, in addition to several other operations that are not available in Map
like getAll(Collection keys):Map.
Though, get(key):value
is not going to be the most "efficient" method for looking up multiple items at once, but getAll(..)
allows you to pass in a Collection
of keys for all the values you want returned. Of course, you have to know the keys of all the values you want in advance, so...
- You can obtain GemFire's
QueryService
from the Region
by calling region
.getRegionService()
.getQueryService()
. The QueryService
allows you to write GemFire Queries with OQL (or Object Query Language). See GemFire's User Guide on Querying for more details.
The advantage of using OQL over getAll(keys)
is, of course, you do not need to know the keys of all the values you might need to validate up front. If the validation logic is based on some criteria that matches the values that need to be evaluated, you can express this criteria in the OQL Query Predicate.
For example...
SELECT * FROM /People p WHERE p.age >= 21;
To call upon the GemFire QueryService
to write the query above, you would...
Region people = cache.getRegion("/People");
...
QueryService queryService = people.getRegionSevice().getQueryService();
Query query = queryService.newQuery("SELECT * FROM /People p WHERE p.age >= $1");
SelectResults<Person> results = (SelectResults<Person>) query.execute(asArray(21));
// process (e.g. validate) the results
OQL Queries can be parameterized and arguments passed to the Query.execute(args:Object[]) method as shown above. When the appropriate Indexes are added to your GemFire Regions
, then the performance of your Queries can improve dramatically. See the GemFire User Guide on creating Indexes.
- Finally, with GemFire
PARTITION Regions
especially, where your Region
data is partitioned, or "sharded" and distributed across the nodes (GemFire Servers) in the cluster that host the Region
of interests (e.g. /People
), then you can combine querying with GemFire's Function Execution service to query the data locally (to that node), where the data actually exists (e.g. that shard/bucket of the PARTITION
Region
containing a subset of the data), rather than bringing the data to you. You can even encapsulate the "validation" logic in the GemFire Function
you write.
You will need to use the RegionFunctionContext
along with the PartitionRegionHelper
to get the local data set of the Region
to query. Read the Javadoc of PartitionRegionHelper
as it shows the particular example you are looking for in this case.
Spring Data GemFire can help with many of these concerns...
For Querying, you can use the SD Repository abstraction and extension provided in SDG.
For Function Execution, you can use SD GemFire's Function ExeAnnotation support.
Be careful though, using the SD Repository abstraction inside a Function context is not just going to limit the query to the "local" data set of the PARTITION
Region
. SD Repos always work on the entire data set of the "logical" Region
, where the data is necessarily distributed across the nodes in a cluster in a partitioned (sharded) setup.
You should definitely familiarize yourself with GemFire Partitioned Regions.
In summary...
The approach you choose above really depends on several factors, such as, but not limited to:
How you organized the data in the first place (e.g. PARTITION
vs. REPLICATE
, which refers to the Region's
DataPolicy).
How amenable your validation logic is to supplying "criteria" to, say, an OQL Query Predicate to "SELECT
" only the Region
data you want to validate. Additionally, efficiency might be further improved by applying appropriate Indexing.
How many nodes are in the cluster and how distributed your data is, in which case a Function
might be the most advantageous approach... i.e. bring the logic to your data rather than the data to your logic. The later involves selecting the matching data on the nodes where the data resides that could involve several network hops to the nodes containing the data depending on your topology and configuration (i.e. "single-hop access", etc), serializing the data to send over the wire thereby increasing the saturation on your network, and so on and so forth).
Depending on your UC, other factors to consider are your expiration/eviction policies (e.g. whether data has been overflowed to disk), the frequency of the needed validations based on how often the data changes, etc.
Most of the time, it is better to validate the data on the way in and catch errors early. Naturally, as data is updated, you may also need to perform subsequent validations, but that is no substitute for early (as possible) verifications where possible.
There are many factors to consider and the optimal approach is not always apparent, so test and make sure your optimizations and overall approach has the desired effect.
Hope this helps!
Regards,
-John