4

I have a Postgres 9.4 RDS instance with Multi-AZ, and there's a slave, read-only replica.

Up to this point the load balancing was made in the business layer of my app, but it's inefficient, and I was hoping to use PGPool, so the app interacts with a single Postgres connection.

It turns out that using PGPool has been a pain in the ass. If I set it to act as a load balancer, simple SELECT queries throw errors like:

 SQLSTATE[HY000]: General error: 7 
 message contents do not agree with length in message type "N" 
 server sent data ("D" message) 
 without prior row description ("T" message)

If I set it to act in a master/slave mode with stream replication (as suggested in Postgres mail list) I get:

psql: ERROR:  MD5 authentication is unsupported 
in replication and master-slave modes.
HINT:  check pg_hba.conf

Yeah, well, pg_hba.conf if off hands in RDS so I can't alter it.

Has anyone got PGPool to work in RDS? Are there other tools that can act as middleware to take advantage of reading replicas in RDS?

Reza Mousavi
  • 4,420
  • 5
  • 31
  • 48
ffflabs
  • 17,166
  • 5
  • 51
  • 77

2 Answers2

1

I was able to make it work here are my working config files:

You have to use md5 authentication, and sync the username/password from your database to the pool_passwd file. Also need enable_pool_hba, load_balance_mode, and master_slave_mode on.

pgpool.conf

listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'
listen_backlog_multiplier = 1
backend_hostname0 = 'master-rds-database-with-multi-AZ.us-west-2.rds.amazonaws.com'
backend_port0 = 5432
backend_weight0 = 0
backend_flag0 = 'ALWAYS_MASTER'
backend_hostname1 = 'readonly-replica.us-west-2.rds.amazonaws.com'
backend_port1 = 5432
backend_weight1 = 999
backend_flag1 = 'ALWAYS_MASTER'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
ssl = on
num_init_children = 1
max_pool = 2
connection_cache = off
replication_mode = off
load_balance_mode = on
master_slave_mode = on

pool_hba.conf

local   all         all                               md5
host    all         all         127.0.0.1/32          md5

pool_passwd

username:md5d51c9a7e9353746a6020f9602d452929

to update pool_password you can use pg_md5 or

echo username:md5`echo -n usernamepassword | md5sum`
username:md5d51c9a7e9353746a6020f9602d452929 -

Output of running example:

psql --dbname=database --host=localhost --username=username --port=9999

database=> SHOW POOL_NODES;

 node_id |                        hostname                 | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------------------------------------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master-rds-database.us-west-2.rds.amazonaws.com | 8193 | up     | 0.000000  | primary | 0          | false             | 0
 1       | readonly-replica.us-west-2.rds.amazonaws.com    | 8193 | up     | 1.000000  | standby | 0          | true              | 0

database=> select now();

 node_id |                        hostname                 | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------------------------------------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master-rds-database.us-west-2.rds.amazonaws.com | 8193 | up     | 0.000000  | primary | 0          | false             | 0
 1       | readonly-replica.us-west-2.rds.amazonaws.com    | 8193 | up     | 1.000000  | standby | 1          | true              | 1

database=> CREATE TABLE IF NOT EXISTS tmp_test_read_write ( data varchar(40) );
CREATE TABLE
database=> INSERT INTO tmp_test_read_write (data) VALUES (concat('',inet_server_addr()));
INSERT 0 1
database=> select data as master_ip,inet_server_addr() as replica_ip from tmp_test_read_write;
  master_ip   |  replica_ip
--------------+---------------
 172.31.37.69 | 172.31.20.121
(1 row)

You can also see from the logs id does both databases:

2018-10-16 07:56:37: pid 124528: LOG:  DB node id: 0 backend pid: 21731 statement: CREATE TABLE IF NOT EXISTS tmp_test_read_write ( data varchar(40) );
2018-10-16 07:56:47: pid 124528: LOG:  DB node id: 0 backend pid: 21731 statement: INSERT INTO tmp_test_read_write (data) VALUES (concat('',inet_server_addr()));
2018-10-16 07:56:52: pid 124528: LOG:  DB node id: 1 backend pid: 24890 statement: select data as master_ip,inet_server_addr() as replica_ip from tmp_test_read_write;

Notice the insert used ip_address of master, and the next select used ip_address of the read only replica.

I can update after more testing, but psql client testing looks promising.

Jeff Peters
  • 106
  • 3
  • 2
    Nice. A bit convoluted but nice anyway. So, you tell pgpool not to replicate bc that's RDS's job. You treat the replica as master. However, the relative weight seems to indicate that reads always go to the replica... How are you preventing write attempts on the replica? – ffflabs Oct 16 '18 at 11:55
0

There is Citus(pgShard) that is supposed to work with standard Amazon RDS instances. It has catches though. You will have a single point of failure if you use the open source version. It's coordinator node is not duplicated.

You can get a fully HA seamless fail over version of it but you have to buy the enterprise licence, but it is CRAZY expensive. It will easily cost you $50,000 to $100,000 or more per year.

Also they are REALLY pushing their cloud version now which is even more insanely expensive.

https://www.citusdata.com/

I have also heard of people using HAProxy to balance between Postgres or MySql nodes.

BrianC
  • 1,793
  • 1
  • 18
  • 26
  • Thanks! I hadn't heard of them. – ffflabs Mar 24 '17 at 21:51
  • Ozgun from Citus -- I wanted to chime in with two quick notes. On pricing, both our cloud & enterprise editions are priced by node per month. The cloud version also includes the hardware cost and starts at $990/month for a small cluster. On the coordinator node becoming a single point of failure (in the community edition), that's one of the reasons we're encouraging our users towards the Cloud. In Citus Cloud, we can automatically [fail over to the replica and attach the elastic IP to the replica](https://www.citusdata.com/blog/2017/03/10/how-to-scale-postgresql-on-aws). – ozgune Apr 09 '17 at 21:50
  • 1
    Exactly. And at $900+ per node per month even at only 10 nodes that is well over $100,000 annually. The the cloud pricing I was quoted was more than double that. And the cloud service makes no sense at all. Going over the internet to a third party datacenter to access a database adds two orders of magnitude of latency for every database write and read. – BrianC Apr 10 '17 at 15:22
  • @BrianC should I use 'pg' instead of 'pg-pool' if I wanted to connect to AWS RDS? btw I'm using pg-pool with elephantsql and its working beautifully – PrimeLens Jan 18 '18 at 04:06
  • Unless things have changed you can not use pg-pool with AWS RDS as you need direct access to the Postgres config files which you do not have with AWS's default Postgres RDS service. – BrianC Jan 18 '18 at 17:27
  • @BrianC what about just 'pg' ? Or should I regard AWS RDS as off limits. BTW great tool Brian – PrimeLens Jan 19 '18 at 12:29