First of all, this problem is independent of whether you started your GemFire (data) servers using Spring Data GemFire (SDG) or not, such as by using Gfsh. Having said that, there are significant advantages to using Spring, and specifically SDG, to bootstrap and configure your servers, Locators, and clients. But, I simply wanted to make this distinction where this problem is concerned for other interested readers.
By getObject()
method, I assume your are actually referring to PartitionResolver.getRoutingObject()
? See Javadoc.
In general, I'd say it is nearly always preferable to use simple, scalar types as keys in your Regions, such as Long
, Integer
, String
, etc. Most searching should be based on the value, or properties of the value (i.e. Object) rather than individual components (e.g. id1
) of the key.
Additionally, I will also point out that I disagree with the PartitionResolver
Javadoc, bullet #1, where it states, "The key class can implement the PartitionResolver
interface to enable custom partitioning". I think this is a naive approach for many reasons, not the least of which is it couples your key class to GemFire. You should always prefer #2 when a PartitionResolver
is needed.
But is a PartitionResolver
actually needed in your case?
Since your "entire" key defines the "route" (i.e. all properties [ id1
, id2
, date
] of the Key
class), you don't even really need to involve a custom PartitionResolver
at all.
All you simply need to do is provide a proper implementation of the Object
equals(:Object)
and hashCode()
methods in your Key
class.
TIP: Keep in mind that GemFire Regions at a basic, fundamental level, are simply a java.util.Map
, key-value data structure. Yes, they are distributed (in most cases) as well as partitioned for the PARTITION Regions, but it is fundamentally based on a Map
and the "hash" of your key. If your entire key defines the partition (or route), then no custom PartitionResolver
is necessary.
TIP: Furthermore, a PARTITION
Region is a logical Region that is divided up into 113 buckets (by default, ignoring primaries & secondaries for a moment) and those buckets are distributed across the (data-hosting) servers in your cluster, making the Region physically dispersed, of course, assuming your servers are individual processes on separate machines. This is what constitutes a "logical" Region, because to your application, it is simply 1 wholistic data structure. Anyway.
You would implement a custom PartitionResolver
if a portion of the key was used to determine the partition (or route) or the key/value pairing. This is useful if you want to group certain key/value pairings together, at the same physical location (i.e. server/process & machine in the cluster).
For example, suppose you want to group similar key/value pairings based on the date
of your key. Then...
class KeyDatePartitionResolver implements PartitionResolver {
public String getName() {
return getClass().getName();
}
public Object getRoutingObject(EntryOperation<Key, Object> entryOp) {
Key key = entryOp.getKey();
return key.getDate();
}
}
Now all entries (key/values) that occurred on a similar date/time would be routed to the same partition (or bucket) in the logical PARTITION Region. Of course, you could further filter the date to group, or route the key/value pairings based on year/month/day or simply year/month, however you choose. Again, all that matters is that the Object
returned from the getRoutingObject(..)
method in your custom PartitionResolver
implements the equals(:Object)
and hashCode()
methods. Obviously, Java's java.util.Date
class (Javadoc) does.
Regarding...
"Is this a right way to create the key index for improving the performance?"
Well, it depends on your application search cases. Are your search cases for certain values based on the components (i.e. [ id1
, id2
, date
]) of the key collectively or individually?
For example, if you search by the combinations [ id1
, date
] as well as [ id2
, date
] then you would create 2 (KEY) Indexes with these fields from the Key
class. If you searched by all 3 fields [ id1
, id2
, date
], then your (KEY) Index would include all 3 fields. If you searched by all 3 combinations, when you would (generally) need all 3 KEY Indexes for optimal performance.
Essentially, a field or combination of fields used in a query predicate expression should be indexed for potentially more optimal performance.
There is no guarantee though, either. Remember, when values change (are added, updated, removed, etc) Indexes need to be updated to some degree. Therefore, there are "maintenance costs" associated with Indexes and the more you have, the more it can potentially cost.
You also have to weigh the benefit between the number of key/value pairings and whether a Index is warranted at all. If the data is mostly referential in nature, with a relatively small data set (e.g. < 1000 entries, perhaps), then sometimes a full scan can still be more efficient in performance than when using Index. A full scan is equivalent to a full table scan in an RDBMS. Just remember, Indexes are not free. They take up space (memory) and time (CPU) to maintain.
I'd also say, it is generally better to (again) use simple keys and maintain "searchable" state in the values associated with the keys. This boils downs to design preference, though. Use (simple) keys for partitioning/routing.
For additional (and relevant) information, see: here, here, here, and here.
Lastly, regarding Functions
, the filter is a set of "keys" (Javadoc). The keys are used to find, or route to the (bucket of the) partition in the logical, PARTITION Region.
If you also configured a custom PartitionResolver
with the PARTITION Region, I believe it will also apply the resolver to the filtered (or set of keys) passed to the Function
when the Function
is executed.
But, you are simply passing the entire key, which in your case is an instance of your Key
class, where you can pass multiple instances (hence, the "Set
") depending on which keys you want to filter by.
Anyway, I hope this all makes sense.
As always, when these sort of questions or asked, it varies significantly based on your UC (or data access patterns), requirements, data set. The proper thing to do here is try things and test.
Good luck!