CloudSQL Postgress (11 or 14) very slow performance and TPS on good instance configuration

Question

We are having a few issues with our CloudSQL instances, we have tried multiple configurations from 1vCpu to 4vCpu, up to 24gb RAM, 100GB SSD, we always have around the same results of around 300-500MS to respond a single small request, translating to around 2-3TPS when testing a single connection with pgbench:

transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 1 number of threads: 1 duration: 100 s number of transactions actually processed: 207 latency average = 483.165 ms initial connection time = 359.565 ms tps = 2.069685 (without initial connection time)

This scales ok with multiple connections as the result shows, but still very slow overall:

transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 50 number of threads: 50 duration: 100 s number of transactions actually processed: 9296 latency average = 538.585 ms initial connection time = 441.047 ms tps = 92.835803 (without initial connection time)

Changing from PG 14 to 11 gained around 15% performance for larger benchmark sets, but still slow for a single connection test.

We haven't touched any of the more advanced things like setting database flags, the instances are pretty much on default.

When running the same test on a local DB (Mac m1 pro) we get much better numbers as expected, but the difference looks too large:

transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 30 s number of transactions actually processed: 4057 number of failed transactions: 0 (0.000%) latency average = 7.393 ms initial connection time = 17.055 ms tps = 135.263939 (without initial connection time)

Those connections are using cloudsql auth proxy latest stable version.

Expect the TPS to be closer to the numbers we see online, at least around 5-10tps per connection.

This is the result from a machine closer to the server in us-central transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 5 number of threads: 1 duration: 30 s number of transactions actually processed: 207 latency average = 690.794 ms initial connection time = 2037.796 ms tps = 7.238044 (without initial connection time)

Network latency is really really bad, half a second. No database can fix that problem, check your network — Frank Heikens, Jan 12 '23 at 21:51
Hey good point forgot to add to the post, those are running from my machine in the other side of the globe indeed. Adding this one to the main post as this is running closer form the us-central server where this instance is located `transaction type: scaling factor: 1 query mode: simple number of clients: 5 number of threads: 1 duration: 30 s number of transactions actually processed: 207 latency average = 690.794 ms initial connection time = 2037.796 ms tps = 7.238044 (without initial connection time)` — MorenoMdz, Jan 12 '23 at 21:56
1) This could be network latency, but it could also be a disk latency problem (the disks for the Postgres instances are typically not on the same server where Postgres is running.) How long does a ping to your SQL server take? 2) What cloud provider are you using? — Nick ODell, Jan 12 '23 at 22:03
@NickODell those are all in CloudSQL with the configurations listed in the post, the latency was around 300-1000ms from the requests from Br and half of that from requests coming from our office in Florida, which is still pretty high considering our GCP servers are in US central — MorenoMdz, Jan 13 '23 at 01:00

jjanes · Answer 1 · 2023-01-13T15:18:29.107

In your first test, you have a network latency problem. Note that the default pgbench transaction is not "a single small request". It is seven separate small requests, each needing one round trip in sequence, which together make up one transaction. To demonstrate the effect of these multiple round trips, you could define a function which packages up all 7 tasks (5 not counting the transaction markers) into a UDF, and then make a custom transaction which calls that UDF in autocommit mode. You should find it is about 7 times faster despite doing the same work. Of course there is no point in doing this demo if you believe me that you have a networking problem. In newer versions of PostgreSQL, you could use the pipelining feature to have multiple of those task in flight at the same time, rather than waiting for one response before sending the next request.

In your closer-to-"us-central" test, you have a concurrency problem as well as a (smaller) network latency problem. In that test, you show the scaling factor as being 1. that means that every transaction is updating the same row in the branch table, and so all concurrent transactions will conflict with each other, and so run partially in single-file. Generally the scaling factor should be at least as large as the number of clients you are testing (5 clients here, but why not use a scaling factor of 50 like you did in the first test?).

Good information, the scaling followed the pgbench defaults, I will give it a try with the scaling factor set to match the clients. How does the 7 round trips translate to TPS, does pgbench assumes each one is an individual transaction? The thing that botters me most now is how bad the latency is even from requests coming from Florida, ill add one more result from a "better" connection next — MorenoMdz, Jan 13 '23 at 14:56
This one was done outside our office which has not the best router setup in the world, so this route to the servers is probably a bit "cleaner": `pgbench -h localhost -p 54320 -U postgres -d postgres -c 250 -j 100 -T 100 transaction type: scaling factor: 50 query mode: simple number of clients: 250 number of threads: 100 duration: 100 s number of transactions actually processed: 23828 latency average = 676.005 ms initial connection time = 37348.646 ms tps = 369.819650 (without initial connection time)` — MorenoMdz, Jan 13 '23 at 15:02
The 7 steps together make up one transaction. For the default transaction, you would expect the latency reported by pgbench to be 7 times the network round-trip latency (plus whatever on-server processing time is needed). — jjanes, Jan 13 '23 at 15:24

CloudSQL Postgress (11 or 14) very slow performance and TPS on good instance configuration

1 Answers1