0

I am planning to migrate my production oracle cluster to postgresql cluster. Current systems support 2000TPS and in order to support that TPS, I would be very thankful if someone could clarify bellow.

1) What is the best replication strategy ( Streaming or DRBD based replication)

2) In streaming replication, can master process traffic without slave and when slave come up does it get what lost in down time ?

  • What kind of "cluster" did you use in Oracle? RAC? Physical standby with DataGuard? If the latter, in which mode did you run DataGuard? –  Mar 27 '18 at 12:32

2 Answers2

2

About TPS - it depends mainly on your HW and PostgreSQL configuration. I already wrote about it on Stackoverflow in this answer. You cannot expect magic on some notebook-like configuration. Here is my text "PostgreSQL and high data load application".

1) Streaming replication is the simplest and almost "painless" solution. So if you want to start quickly I highly recommend it.

2) Yes but you have to archive WAL logs. See below.

All this being said here are links I would recommend you to read:

But of course streaming replication has some caveats which you should know:

JosMac
  • 2,164
  • 1
  • 17
  • 23
  • 2
    Wal archiving is no more necessary. You can use wal streaming and dedicated replication slot instead so wal logs (old pg_xlog) are automatically preserved until slave database consume it. The only "drawback" is that if slave is not available for long time, pg_xlog/pg_wal contents on master will grow indefinitely and could fill whole disk ending up blocking write transaction or even making postgres to crash. But, personally, I prefer that than loosing slave server without noticing it (if I notice I can fix in time or simply manually remove replication slot to release wals). – bitifet Mar 27 '18 at 11:01
  • Hmmm :-) I would not talk about hard-crashed PG database due to not enough disk space so lightly... We had these events and generally there is a huge probability of having corrupted tables after crush or PG not starting at all when clogs are corrupted. We had to implement script for emergency stop of PG to prevent it - http://postgresql.freeideas.cz/bash-script-emergency-stop-postgresql/ – JosMac Mar 27 '18 at 11:19
  • With which postgres version did you experienced that? I had a few issues with no data lost and even tried to intentionally damage databases (in testing environments of course) and just adding extra space Postgres continued working (if not yet crashed) or started again (if it didd) without any problem. Maybe you had physical server crash too? In that case could be a journaling problem related to some buggy hard disk cache... – bitifet Mar 27 '18 at 11:26
  • Just in case it could be useful for anyone, here my own not yet fully but almost finished script in which I'm currently working on: [pgstandby](https://github.com/bitifet/vagrants/blob/master/PostgreSQL/pgstandby). – bitifet Mar 27 '18 at 12:27
  • @bitifet PG 9.6 on Google compute instance. Problem is that hard crash of PG due to not enough disk space is like Russian roulette. You cannot anticipate which file will be corrupted due to not enough space to write it to disk. PG can survive without problems corrupted WAL log. This is most usual case. But you can also get corrupted table file and if you are not lucky you get corrupted commit log in clog directory - this is worst case - PG will not start if this happens... – JosMac Mar 27 '18 at 12:32
  • It must not be: Transactions that can't be committed should be rolled back. Databases with no writes usually survive on lack of disk issues. Even not a desirable situation, when writing operation is to be performed in this case, I/O error is issued and transaction is rolled back. Otherwise, as a transactional system it is supposed to be, that should be considered as a bug. – bitifet Mar 27 '18 at 12:39
  • 1
    @bitifet OK, I understand :-) unfortunately our database obviously did not know it and after hard crash I got error message about corrupted commit log. I had to reset transactions and extract commit log from last pg-barman backup to be able to at least start database for dump of some most critical changes. After it I had to restore last pg-barman backup to get consistent status back... – JosMac Mar 27 '18 at 12:54
  • We have high end hardware. Is there any reference production architecture for such highly available system for supporting high TPS ? – Nishantha Pradeep Mar 28 '18 at 05:42
  • @NishanthaPradeep I added link to my answer + here is too - http://postgresql.freeideas.cz/postgresql-and-high-data-load-application/ – JosMac Mar 28 '18 at 08:34
  • Is there any commercial support for postgreSQL that we can purchase and how about the cost ? – Nishantha Pradeep Apr 17 '18 at 06:15
0
  1. There is no “best solution” in this case. Which solution you pick depends on your requirements.
    • Do you need a guarantee of no data lost?
    • How big a performance hit can you tolerate?
    • Do you need failover or just a backup?
    • Do you need PITR (Point In Time Recovery)?
  2. By default, I think a failed slave will be ignored. Depending on your configuration, the slave might take a long time to recover after e.g. a boot.

I'd recommend you read https://www.postgresql.org/docs/10/static/different-replication-solutions.html

Morten
  • 634
  • 5
  • 13
  • 1) This is missing critical data. Therefore yes 2) We can live with around 500 TPS for few minutes 3) Failover is required. We expect to run primary as transaction processing node and slave in read only mode. If the primary fails salve should become the transaction processing node. We expect to use Virtual IP / pacemaker / Corosync to achieve failover. Any other suggestions are mostly welcome. 4) Yes but can it be done using Streaming replication ? – Nishantha Pradeep Mar 28 '18 at 05:36
  • Based on your answers, I'd say try a setup with streaming replication plus WAL shipping. Make sure to have a backup server that's physically close to your primary if you plan on having synchronous streaming replication. The throughput of your server will depend on a bunch of stuff including specs of the servers, complexity of the database, and latency of the network, so it's impossible for us to say if that will be a match for your throughput requirements. – Morten Mar 29 '18 at 17:08