MySQL Scaling on GCP

Question

I created a instance (8 core) of MySQL on GCP. And a simple database in it. When I run a load of 40000+ concurrent users (1500 req/sec), the response times come out very high (10 seconds+). However I can see the hardware cpu utilization only at 15% or so.

What can I do to get the response time in msec?

Cheers! Deepak

Are you looking for milliseconds or microseconds? msec is not real specific. You may use Slow Query Log to get to microseconds where the Query_time: is reported as seconds.nnnnnn for microseconds after the decimal. Milliseconds only needs 4 places after the decimal. — Wilson Hauck, Sep 03 '21 at 18:49
40000 _concurent_ users is _not_ realistic. Try again with only 100. — Rick James, Sep 03 '21 at 20:03
When you go past about 100 _concurrent_ users, you stop gaining throughput; the users spend most of their time stumbling over each other -- perhaps 10 minutes worth! — Rick James, Sep 06 '21 at 01:04

Rick James · Answer 1 · 2021-09-06T01:03:47.527

2

Imagine cramming 40000 shoppers in a grocery store. How many hours would it take for a shopper to buy just one carton of milk?

Seriously, there is a limit to how many connections can be made to any device. Database computers will top out at a few hundred. After that, latency will suffer severely as all the connections are waiting for their turn at various shared resources.

Another approach

Let's say these are achievable:

10ms to connect, fetch info for a page, and disconnect.
1500 pages built per second. (By the way, make sure the web server can achieve this.)
15 concurrent connections, each running for 10 ms. That equals 1500 pages per second.
1500 pages per second = 90000 pages per minute.

So, let's specify "40000 pages delivered to different (or same) users in one minute". I suggest that will be easy. And it won't require much more than 15 concurrent users. (Traffic is never smooth [except in a benchmark], so 50 concurrent connections may happen.)

edited Sep 06 '21 at 01:03

answered Sep 03 '21 at 21:23

Rick James

135,179
13
127
222

Amazon has lacs of users at the same time and their site does not crash or give bad response times. Most of their pages served from db are in milliseconds. – Deepak Goel Sep 04 '21 at 13:54
5

@DeepakGoel - and they have thousands of web servers talking to thousands of Replicas, all behind dozens of load-balancers for redirecting traffic. – Rick James Sep 04 '21 at 14:43
thats where i would like to go eventually...however i need to solve a basic problem of mine which I have posted above! – Deepak Goel Sep 05 '21 at 18:16
1

On a single server with a standard network, "40000+ concurrent users" is not solvable. "1500 req/sec" can probably be achieved. – Rick James Sep 06 '21 at 00:52
@DeepakGoel - And I added more to my Answer. – Rick James Sep 06 '21 at 01:03

Bill Karwin · Answer 2 · 2021-09-06T01:36:07.980

[thousands of database servers is] where i would like to go eventually...however i need to solve a basic problem of mine which I have posted above!

Right, you have to expand the number of database servers now if you are serving 40,000 concurrent queries. Not eventually.

But let's be clear about what comprises concurrent users. Here's an example:

mysql> show global status like 'threads%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 1266  |
| Threads_running   | 9     |
+-------------------+-------+

I've analyzed high-scale production sites for dozens of internet companies. It's typical to see hundreds or thousands of concurrent connections but few of these are executing an SQL query at any given moment. When a given thread is between executing queries, you can view it SHOW PROCESSLIST but it is only doing "Sleep".

This is fine, and it's normal.

I give the analogy to an ssh session: you may be connected to a shell on a linux server, but if you're doing nothing, just sitting at a shell prompt, you aren't taxing the server resources much. You could have hundreds of users connected with ssh to the same server at once. But if they all begin running applications at the same time, you're in trouble. The server might not handle that load well. At least, all of the users will experience slow performance.

It's the same with a MySQL database. If you need a server that can support 40,000 Threads_running, then you need to spread that load over many MySQL servers. There isn't any single server that exists today that can handle that.

But you might mean something different when you say 40,000 concurrent users. It might be that you have 40,000 users who are looking at some page on your website at the same time. But that's not resulting in continuous SQL queries in 40,000 database sessions all at the same time. Each person spends some time reading the web page they just loaded, and scrolling up and down, and perhaps typing into a form. While they are doing that, the website is waiting for their next request, and the web server and database server is not doing any work for that user while it's waiting. It can do some work for other users.

In this way, a single database server can support 40,000 (or more) users who are by some definition using the site, even though only a handful are invoking any code to run SQL queries at any given moment.

This is normal and most websites can handle that traffic with no problems.

If that's the reality of your application, and you still have problems scaling it, then you might have inefficient application code or unoptimized SQL queries. That is, the website could serve the requests easily if you wrote the code to be more efficient.

Inefficient code cannot be fixed by changing your server. The cost of inefficient code scales up faster than you can hope to handle it by upgrading the server. So you must solve performance problems by writing better code.

This is the point of an old tweet of mine:

The subject of scalable internet architecture is very complex. You need to do a lot of study and a lot of testing to grow a website and make it scalable.

You can start by reading. My favorite is Theo Schlossnagle's book Scalable Internet Architectures. Here is a video of Theo speaking about the same subject: https://www.youtube.com/watch?v=2WuT2rdLK5A

The book is from quite a few years ago. Perhaps the scale websites need to support is greater than it was back then, but the methods of achieving scalability are the same today.

Test
Identify bottlenecks
Rearchitect your web app code to relieve those bottlenecks
Test again

So you are saying 'if there are about 4000 queries/sec on the database' and a single-server cannot handle, then it is best to have multiple MySQL servers. Sorry to bug you, but how do I achieve multiple MySQL Servers (Do you mean something like NDB Clusters?) Is there a easy way in GCP? — Deepak Goel, Sep 07 '21 at 06:47
Sharding is not easy. I don't mean like NDB Cluster (as far as I know, GCP does not support that). I mean you launch multiple MySQL instances, and your app has a handle to each instance. You split up your data over these instances, so that approximately an equal share of the queries will run on each instance. This is HARD. — Bill Karwin, Sep 07 '21 at 06:55
This will need a lot of work on the application side, which might not be a good idea. — Deepak Goel, Sep 15 '21 at 05:02

MySQL Scaling on GCP

2 Answers2