One thing that is certain is that there must be at least two machines running pgpool
. How you achieve this depends - there is no solution universally applicable to all cases. If what you have is a web application, then you must also run the web application in at least two machines, so you could make something like this:
+----------+ +---------+
| pgmaster | | pgslave |
+----------+ +---------+
| |
+----------+-------------+-----------+
| |
+-----|----+ +-----|----+
| pgpool | | pgpool |
| | | | | |
| webapp | | webapp |
+-----|----+ +-----|----+
| |
internet internet
(In that case you also need some kind of failover on the client side - the one I've marked as "internet".)
If, on the other hand, you really need, not a highly available web application (or similar service), but a highly available postgresql (to which any client can connect anytime), then an alternative is
+----------+ +---------+
| pgmaster | | pgslave |
+----------+ +---------+
| |
+----------+-------------+-----------+
| |
+-----|----+ +-----|----+
| pgpool | | pgpool | (standby)
+-----|----+ +-----|----+
| |
Failover
IP address
|
client
The pgpool
, in that case, could also be in the same machine as the databases. What is important is that you need some kind if IP address failover, which could be keepalived
, but the exact solutions available depend on lower level networking details of the data center you are using (for example, keepalived
can't work in Hetzner, as they have a different way of switching failover IPs). Also note that in this case connected clients probably will disconnect in case of failover, but they will able to reconnect immediately.
Also note that there are other difficulties, one of them being that you can't rule out network partitioning, where both PostgreSQL machines will be working and connected, but they will have somehow lost connection to each other, so each one of them will think the other is dead, and therefore each one will decide to be a master. To address this issue, I know of three solutions: 1) STONITH, which requires special hardware; 2) Quorums, which require special software (such as corosync/pacemaker); 3) Manual failover (the admins get notified and the system is broken until they decide how to fix it). However, it could be not too difficult to setup a quorum if you use my proposed scheme above, but with three pgpool
s instead of two; but I don't remember if pgpool
supports that.
Bottom line: high availability can be difficult and expensive. Carefully examine the possibility of avoiding it altogether. If you can't, be prepared to study much and design much and redesign much, and be aware it's going to take a lot of time.