Default failure/recovery behavior for Gemfire Server/Client Architecture

Question

For gemfire cache, we are using the client/server architecture in 3 different geographic regions with 3 different locators.

Cache Server

Each region would have 2 separate cache server, potentially one primary and one secondary
The cache servers are peer-to-peer connection
The data-policy on the cache servers is replicate
No region persistence is enabled

Cache Client

No persistence is enabled
No durable queues/subscriptions are set up

What would the default behaviors of the following scenarios:

All cache servers in one geo-region crashes, what happens to the data in the cache clients when the cache servers restart? Does the behavior differ for cache clients with proxy or caching-proxy client cache regions?
All cache clients in one geo-region crashes. Although we don't have durable queues/subscriptions set up, for this scenario, let's assume we do. What happens to the data in the cache clients when they restart? Does the behavior differ for cache clients with proxy or caching-proxy client cache regions?
All cache servers and cache clients in one geo-region crashes, what happens to the data in the cache servers and cache clients when they start up? Does the behavior differ for cache clients with proxy or caching-proxy client cache regions?

Thanks in advance!

Though important, your questions are very broad and depends on many factors, such as, but not limited to, in general, the configuration, and more specifically, persistence in the servers/members, how redundancy is configured, as you mention, data policy (both clients and servers actually), whether you have durable queues setup, whether you have enabled persistence on clients (in case they go down too, or just come and go; also related to data policy), interests registrations (or CQs), etc, etc. — John Blum, Feb 08 '17 at 23:58
A good place to start is by first just understanding the different GemFire topologies http://gemfire90.docs.pivotal.io/geode/topologies_and_comm/book_intro.html and make sure you have an accurate definition of the terminology you used... e.g. server/members. For instance, a GemFire Server is a member of the cluster and can be just a data node, but could also be setup as a `CacheServer` as well to serve clients. Another useful, reference in the docs... http://gemfire90.docs.pivotal.io/geode/developing/partitioned_regions/chapter_overview.html (particularly on HA). Hope this helps get u goin. — John Blum, Feb 09 '17 at 00:02
Can you clarify what the distinction is that you are making between members and servers? A typical client/server configuration would be that all members of the distributed system are also servers. Are you suggesting that you have a mix of members that are servers and other members that are not servers? — Dan Smith, Feb 09 '17 at 00:32
@JohnBlum, thanks for the prompt feedback and sorry for the confusion with broad question. I've edited my question to be more specific. — Work of Art, Feb 09 '17 at 14:23
@DanSmith, thanks for the prompt feedback and sorry for the confusion with broad question. I've edited my question to be more specific. — Work of Art, Feb 09 '17 at 14:23

score 1 · Answer 1 · answered Feb 09 '17 at 19:34

Ok, so based on how I am interpreting your configuration/setup and your questions, this is how I would answer them currently.

Also note, I am assuming you have NOT configured WAN between your separate clusters residing in different "geographic regions". However, some of the questions would not matter if WAN was configured or not.

Regarding your first bullet...

what happens to the data in the cache clients when the cache servers restart?

Nothing.

If the cache client were also storing data "locally" (e.g. CACHING_PROXY), then the data will remain intact.

A cache client can also have local-only Regions only available to the cache client, i.e. there is no matching (by "name") Region in the server cluster. This is determined by 1 of the "local" ClientRegionShortcuts (e.g. ClientRegionShortcut.LOCAL, which corresponds to DataPolicy.NORMAL). Definitely, nothing happens to the data in these type of client Regions if the servers in the cluster go down.

If your client Regions are PROXIES, then your client is NOT storing any data locally, at least for those Regions that are configured as PROXIES (i.e. ClientRegionShortcut.PROXY, which corresponds to DataPolicy.EMPTY).

So...

Does the behavior differ for cache clients with proxy or caching-proxy client cache regions?

See above, but essentially, your "PROXY" based client Regions will no longer be able to "communicate" with the server.

For PROXY, all Region operations (gets, puts, etc) will fail, with an Exception of some kind.

For CACHING_PROXY, a Region.get should succeed if the data is available locally. However, if the data is not available, the client Region will send the request to the server Region, which of course will fail. If you are performing a Region.put, then that will fail sense the data cannot be sent to the server.

Regarding your second bullet...

What happens to the data in the cache clients when they restart?

Depends on your "Interests Registration (Result) Policy" (i.e. InterestResultPolicy) when the client registers interests for the events (keys/values) in the server Region, particularly when the client comes back online. The interests "expression" (either particular keys, or "ALL_KEYS" or a regex) determines what the client Region will receive on initialization. It is possible not to receive anything.

Durability (the durable flag in `Region.registerInterest(..).) of client "subscription queues" only determines whether the server will store events for the client when the client is not connected so that the client can receive what it missed when it was offline.

Note, an alternative to "register interests" is CQs.

See here and here for more details.

As for...

Does the behavior differ for cache clients with proxy or caching-proxy client cache regions?

Not that I know of. It all depends on your interests registration and/or CQs.

Finally, regarding your last bullet...

All cache servers and cache clients in one geo-region crashes, what happens to the data in the cache servers and cache clients when they start up?

There will be no data if you do not enable persistence. GemFire is an "In-Memory" Data Grid, and as such, it keeps your data in memory only, unless you arrange for storing your data externally, either by persistence or writing a CacheWriter to store the data in an external data store (e.g. RDBMS).

Does the behavior differ for cache clients with proxy or caching-proxy client cache regions?

Not in this case.

Hope this helps! -John

Thank you so much John for the elaborate response, I really appreciate it! Just a follow-up question for my 3rd bullet point: Given that we have a peer-to-peer model with no persistence, won't cache servers in that cluster/geo-region be populated by peers from other geo-regions, since the cache regions are "replicate"? — Work of Art, Feb 09 '17 at 20:15
No. That requires a WAN topology setup. See here for more details... http://gemfire90.docs.pivotal.io/geode/topologies_and_comm/multi_site_configuration/chapter_overview.html — John Blum, Feb 09 '17 at 23:00
Thanks John, even though we are not using WAN, the cache servers, although in different geo-regions, are peers to each other in ONE distributed system right? The cache regions on the cache servers are replicate and distributed_ack. Or does this occur only when an update is triggered not when cache servers start up from a crash? Reference: http://gemfire.docs.pivotal.io/geode/developing/distributed_regions/how_distribution_works.html — Work of Art, Feb 10 '17 at 14:24
The regions that are replicated in the cache server do get populated by peers from other geo-regions after start up from crash, without persistence or WAN setup - just verified. — Work of Art, Feb 13 '17 at 20:44

Default failure/recovery behavior for Gemfire Server/Client Architecture

1 Answers1