Details Gemfire Version: 9.15 Previous Gemfire Version: 8.2.7 Serializer: ReflectionBasedAutoSerializer
Hi there, I was trying to perform an upgrade on my application from gemfire 8 to gemfire 9. The upgrade was largely smooth but I encountered some issues during the execution of the test cases.
A bit of background into how the tests are conducted: We initialize a single client-cache during the lifetime of the test suite and execute all our gemfire related operations on that cache. The client cache is standalone, i.e. we did not initialize any pools, servers or locators with it.
To replicate: Setup standalone client-cache, create single region and attempt to write a transaction with it.
A bit of background into how the tests are conducted: We initialize a single client-cache during the lifetime of the test suite and execute all our gemfire related operations on that cache. The client cache is standalone, i.e. we did not initialize any pools, servers or locators with it.
What I could observe was that the test cases fails as a result of an issue during transaction commit. In GF9, there seems to be an additional step included to serialize the data that is to be committed.
TXState
protected void applyChanges(List/* <TXEntryStateWithRegionAndKey> */ entries) {
// applyChangesStart for each region
for (Map.Entry<InternalRegion, TXRegionState> me : regions.entrySet()) {
InternalRegion r = me.getKey();
TXRegionState txrs = me.getValue();
txrs.applyChangesStart(r, this);
}
// serializePendingValue for each entry
for (Object entry : entries) {
TXEntryStateWithRegionAndKey o = (TXEntryStateWithRegionAndKey) entry;
o.es.serializePendingValue(); ---> Serialization happens here
}
...
As a result, the serializer tries to serialize the data and also define a new type for caching in the TypeRegistry class. During this process, it looks like the serializer first tries to look for an existing type in its type registry before checking the distributed cluster if it is unable to find anything.
@Override
public int defineType(PdxType newType) {
Collection<Pool> pools = getAllPools(); --> Where issue happens
ServerConnectivityException lastException = null;
int newTypeId = -1;
for (Pool pool : pools) {
try {
newTypeId = GetPDXIdForTypeOp.execute((ExecutablePool) pool, newType);
newType.setTypeId(newTypeId);
copyTypeToOtherPools(newType, newTypeId, pool);
return newTypeId;
} catch (ServerConnectivityException e) {
// ignore, try the next pool.
lastException = e;
}
}
throw returnCorrectExceptionForFailure(pools, newTypeId, lastException);
}
Error Stack Trace
org.apache.geode.cache.CacheClosedException Create breakpoint: Client pools have been closed so the PDX type registry is not available.
at org.apache.geode.internal.cache.GemFireCacheImpl.getCacheClosedException (GemFireCacheImpl.java:1717) ~[geode-core-1.14.4.jar:?] at org.apache.geode.internal.cache.GemFireCacheImpl.getCacheClosedException (GemFireCacheImpl.java:1706) ~[geode-core-1.14.4.jar:?] at org.apache.geode.pdx.internal.ClientTypeRegistration.getAllPools (ClientTypeRegistration.java:153) ~[geode-core-1.14.4.jar:?] at org.apache.geode.pdx.internal.ClientTypeRegistration.defineType(ClientTypeRegistration.java:63) ~[geode-core-1.14.4.jar:?]
at org.apache.geode.pdx.internal.TypeRegistry.defineType(TypeRegistry.java:202) ~[geode-core-1.14.4.jar:?]
at org.apache.geode.pdx.internal.TypeRegistry.defineLocalType (TypeRegistry.java:245) ~[geode-core-1.14.4.jar:?]
at org.apache.geode.pdx.internal.PdxWriterImpl.completeByteStreamGeneration (PdxWriterImpl.java:540) ~[geode-core-1.14.4.jar:?] at org.apache.geode.pdx.internal.PdxWriterImpl.getAutoPdxType (PdxWriterImpl.java:571) ~[geode-core-1.14.4.jar:?]
at org.apache.geode.pdx.internal.AutoSerializableManager.writeData(AutoSerializableManager.java:2054) ~[geode-core-1.14.4.jar:?] at org.apache.geode.pdx.internal.AutoSerializableManager.writeData(AutoSerializableManager.java:1994) ~[geode-core-1.14.4.jar:?]
When getAllPools() method is called, an error gets thrown as there were none initialized. Even when I declared it in our test cache.xml, it didn't work as well as it would complain that it is unable to connect to any servers and locators. At the moment, it seems like one of the solution is to setup a server/locator cluster, similar to how its done here MultiSiteCachingIntegrationTests as mentioned in this other issue. but I am still keen to know if there are any other less invasive ways to work around this issue.
Ideally, I would like to resolve it without changing up the test setup in my application (Having to start a new server / locator)