I am trying to create PostgreSQL - Etcd - Patroni(PEP) cluster. There are lots of examples on the internet and I have created one and it runs perfect. Yet, this architecture should comply with my company' s backup solution which is NetApp. We are putting the database into backup mode with "SELECT pg_start_backup('test_backup', true);" and then copy all the data files to backup directory.
PEP cluster has a small problem with this solution. Taking backup is running fine, but restoration point is not that much good. In order to restore the leader of the PEP cluster I need stop the database and then move the backup files to the data directory and finally start the restoration. At this point Patroni says the restoration node is a new cluster. Here is the error:
raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
2022-04-11 12:49:29,930 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-04-11 12:49:29,942 INFO: Lock owner: None; I am pgsql_node1
2022-04-11 12:49:29,962 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
Also, when I check the patroni cluster status I saw this:
root@4cddca032454:/data/backup# patronictl -c /etc/patroni/config.yml list
+ Cluster: pgsql (7085327534197401486) --------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+------------+---------+---------+----+-----------+
| pgsql_node1 | 172.17.0.6 | Replica | stopped | | unknown |
| pgsql_node2 | 172.17.0.7 | Replica | running | 11 | 0 |
| pgsql_node3 | 172.17.0.8 | Replica | running | 11 | 0 |
+-------------+------------+---------+---------+----+-----------+
At this point I have a PEP cluster without a leader. So, how can I solve this issue?
(Note: The restoration node attempted to join right cluster because, before starting the restoration I check cluster status and got this result:
root@4cddca032454:/data/backup# patronictl -c /etc/patroni/config.yml list
+ Cluster: pgsql (7085327534197401486) --------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+------------+---------+---------+----+-----------+
| pgsql_node2 | 172.17.0.7 | Replica | running | 11 | 0 |
| pgsql_node3 | 172.17.0.8 | Replica | running | 11 | 0 |
+-------------+------------+---------+---------+----+-----------+
pgsql_node1 is not there.
)
As explained here, "https://patroni.readthedocs.io/en/latest/existing_data.html#existing-data" I can create a new cluster after restoration but my priority saving the cluster. Or do I think wrong, all this steps are same with the converting a standalone PostgreSQL database to PEP cluster?
Please let me know if you need any data or something is not clear.
Here is my leader node patroni config file:
scope: "cluster"
namespace: "/cluster/"
name: 8d454a228d251
restapi:
listen: 172.17.0.2:8008
connect_address: 172.17.0.2:8008
etcd:
host: 172.17.0.2:2379
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
check_timeline: true
postgresql:
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
remove_data_directory_on_diverged_timelines: true
use_slots: true
postgresql:
listen: 0.0.0.0:5433
connect_address: 172.17.0.2:5432
use_unix_socket: true
data_dir: /data/postgresql/
bin_dir: /usr/lib/postgresql/14/bin
config_dir: /etc/postgresql/14/main
authentication:
replication:
username: "patroni_replication"
password: "123123"
superuser:
username: "patroni_superuser"
password: "123123"
parameters:
unix_socket_directories: '/var/run/postgresql/'
logging_collector: 'on'
log_directory: '/var/log/postgresql'
log_filename: 'postgresql-14-8d454a228d25.log'
restore_command: 'cp /data/backup/%f %p'
recovery_target_timeline: 'latest'
promote_trigger_file: '/tmp/promote'
Thanks!