0

There is this rather simple query that I have to run on a livesystem, in order to get a count. The problem is that the table and database are rather inefficiently designed and since it is a livesystem altering it is not an option at this point.
So I have to figure out a query that runs fast and won't slow down the system too much, because for the time of the query execution the system basically stops which is not really what I would like a livesystem to do, so I need to streamline my query in order to make it perform in an acceptable time.

SELECT id1, count(id2) AS count FROM table GROUP BY id1 ORDER BY count 
DESC;

So here is the query, unfortunately it is so simple that I am out of ideas on how to further improve it, maybe someone else has an idea ... ?

fortune
  • 3,361
  • 1
  • 20
  • 30
Samarek
  • 444
  • 1
  • 8
  • 22
  • You can add index for id1 column. – Sudhakar May 30 '17 at 08:53
  • > since it is a livesystem altering it is not an option at this point – Samarek May 30 '17 at 08:54
  • Probably even an index (id1, id2) to have a covering index, then MySQL doesn't have to read the actual data. – johannes May 30 '17 at 09:04
  • altering the table (adding an index is altering) is NOT an option – Samarek May 30 '17 at 09:05
  • I'd create a table that gets populated by counts. I'd use triggers to maintain the numbers (before insert, before update, after delete etc). Then, you don't have to run this query at all. However, what I'm suggesting requires altering the *database*. I've no idea if you're allowed to do so. Also, how many records are we talking about here? – N.B. May 30 '17 at 09:07
  • No alterations to the database either and its roughly 20 million records – Samarek May 30 '17 at 09:13
  • what if alter query runs faster then are you ok with alter query? – Ronak Patel May 30 '17 at 09:18
  • You know that there are ways to alter tables without locking them? There's really not much to pimp on this simple query, besides `ORDER BY NULL`, which probably doesn't gain much. – fancyPants May 30 '17 at 09:23
  • NO alterations to the database, altering is NOT an option. Anymore question about whether altering database or table are an option? I mean, seriously, how many times have I said this now. – Samarek May 30 '17 at 09:23
  • @fancyPants Thx for the answer, I know there isn't much one can do, what would ```ORDER BY NULL``` accomplish ? – Samarek May 30 '17 at 09:24
  • Since you don't have a `LIMIT` clause anyway, the order is not important (or you can sort it with PHP or whatever you're using). `GROUP BY` in MySQL implicitly does an `ORDER BY`, even when you don't specify an `ORDER BY` clause. You can prevent sorting with `ORDER BY NULL`. It may help, when a temporary table on disk is used, but in most cases it doesn't make a huge difference. – fancyPants May 30 '17 at 09:27
  • 1
    Have a look at percona tools, for example. The pt-online-schema-change tool doesn't lock tables while altering. Or gh-ost... – fancyPants May 30 '17 at 09:29
  • [How to improve MySQL count performance on very big tables](https://stackoverflow.com/questions/10976328/mysql-count-performance-on-very-big-tables)? – Sandman May 30 '17 at 09:30
  • If making changes in NOT an option, then helping you is NOT an option. – Rick James May 30 '17 at 21:30
  • @Sandman - I added an Answer to that link. – Rick James May 30 '17 at 22:37

2 Answers2

0

Application Get "good enough" results via application changes:

If you have access to the application, but not the database, then there are possibilities:

Periodically run that slow query and capture the results. Then use the cached results.

Do you need all

What is the goal? Find a few of the most common id1's? Rank all of them?

Back to the query

COUNT(id2) checks for id2 being not null; this us usually unnecessary, so COUNT(*) is better. However the speedup is insignificant.

ORDER BY NULL is irrelevant if you are picking off the rows with the highest COUNT -- the sort needs to be done somewhere. Moving it to the application does not help; at least not much.

Adding LIMIT 10 would only help because of cutting down on the time to send the data back to the client.

INDEX(id1) is the best index for the query (after changing to COUNT(*)). But the operation still requires

  • full index scan to do the COUNT and GROUP BY
  • sort the grouped results -- for the ORDER BY

Zero or near-zero downtime

Do you have replication established? Galera Clustering?

Look into pt-online-schema-change and gh-ost.

What is the real goal?

We cannot fix the query as written. What things can we change? Better yet, what is the ultimate goal -- perhaps there is an approach that does not involve any query that looks the least like the one you are trying to speed up.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

Now I have just dumped the table and imported it into a MySQL-Docker, ran the query there, took ages and I actually had to move my entire Docker because the dump was so huge, but in the end I got my results and now I know how many id2s are associated with specific id1s (apostrophe to form a plural? You may want to double-check that ;) ).
As it was already pointed out, there wasn't much room for improvement on the query anymore.

FYI suddenly the care about stopping the system was gone and now we are indexing the table, so far it took 6 hours, no end in sight :D

Anyways, thanks for the help everyone.

Samarek
  • 444
  • 1
  • 8
  • 22