I'm trying to test out MariaDB, Galera, and Corosync/Pacemaker to understand clustering with high-availability using CentOS 7 servers. The cluster size I am using for testing is 3 servers to prevent quorum issues for the most part. My tests and application are written in Java.
I have the clustering part down; it works in an active-active configuration and runs just fine. As well, I have HA set up using Pacemaker and corosync. I've done many tests with it in failing the slaves or bringing them up during the run. As well, I've written to all three at run-time regardless of if it was the master connection or one of the child connections. When I try to test the Master going down during run-time (to simulate a power outage, server crash, or whatever in the data center), the application immediately stops running. I get a java.net.SocketException
error and the application closes with the other two connections being shut down successfully. I've used both the kill
and stop
commands in the terminal to test and see if it'll work (just in the off chance it would).
JDBC URL String
Below is the part of the code that connects the application to the cluster. It connects correctly and does work until I cause the first master to go down; the other two going down does not affect this.
public void connections() {
try {
bigConnec = DriverManager.getConnection(
"jdbc:mariadb:sequential:failover:loadbalance://"
+ "10.32.18.90,10.32.18.91,10.32.18.92/"+DB+"?autoReconnect=true&failOverReadOnly=false"
+ "&retriesAllDown=120",
"root", "PASS");
bigConnec.setAutoCommit(false);
} catch(SQLException e) {
System.err.println("Unable to connect to any one of the three servers! \n" + e);
System.exit(1);
}
...
}
There are three other connections made to each individual server so I can pull information more easily from them; that is what the ellipses indicate. The servers will exchange which is the "primary" node but the application will not connect to the next node in the list.
I feel like the issue is in the way I have my URL set up because everything works outside of the cluster. As well, nothing happens during testing when I shut down the child nodes. The most that happens is I'll get a warning that the URL lost connection to either or both of them. Is there a way to configure the URL in such a way to allow automatic failover to the next available node in the string or do I have to go about it some other way using individual connection URLs and Objects (or an array of Connection objects) or black magic and pixie dust that really only SysAdmins know some other way I have yet to try?
What I Have Tried
- How to make MaxScale High Available with Corosync/Pacemaker (MariaDB Article)
- Failover and High availability with MariaDB Connector/J (MariaDB Documentation)
- What is the right MariaDB Galera jdbc URL properties for loadbalance (Stack Overflow)
- HA Proxy Configuration with MariaDB Galera cluster (Stack Overflow)
- Configuring Server Failover (MySQL Documentation)
- Advanced Load-balancing and Failover Configuration (MySQL Documentation)
TL;DR
Problem: Java application is not automatically failing over despite being flagged for failover and sequential support in the JDBC URL (see JDBC URL String). Everything works with corosync and Pacemaker but I cannot get the Java application to transfer to the next available node to act as the primary connection when the current one goes down.
Question: Is the issue in the URL? A follow-up to that is, if so, would it be better to use three separate connections and use the first valid one or is there something I can do to allow the application to automatically rollover to the next available connection in the current URL?
Software/Equipment
- MariaDB 10.1.24
- corosync 2.4.0
- Pacemaker 1.1.15
- CentOS 7
- Java 8 / Eclipse Neon.3 / Eclipse 4.6.3
- MariaDB Connector/J 2.0.1
If there is any more information you need, please do tell me in the comments and I'll update this as soon as I can!