I have set-up a testing Postgres-XL cluster with the following architecture:
- gtm - vm00
- coord1+datanode1 - vm01
- coord2+datanode2 - vm02
I created a new database, which contains a table that is distributed by replication. This means that I should have the exact copy of that table in each and every single datanode.
Doing operations on the table works great, I can see the changes replicated when connecting to all coordinator nodes.
However, when I simulate one of the datanodes going down, while I can still read the data in the table just fine, I cannot add or modify anything, and I receive the following error:
ERROR: Failed to get pooled connections
I am considering deploying Postgres-XL as a highly available database backend for a fair number of applications, and I cannot control how those applications interact with the database (it might be big a problem if those applications couldn't write to the database while one datanode is down).
To my understanding, Postgres-XL should achieve high availability for replicated tables in a very transparent way and should be able to support losing one or more datanodes (as long as at least one is still available - again, this is just for replicated tables), but this does not seem the case.
Is this the intended behaviour? What can be done in order to be able to withstand having one or more datanodes down?