1

We recently upgraded our MySQL database from 5.7 to 8.0 and changed all the collations from latin to utf8 mb4.

Mostly this went very well and it fixed the problem we were trying to solve. But it created another problem that we just don't understand.

We have a very long query in the database that requires 65 Joins (Don't ask, I know it's horrible, we're in the process of engineering it out, but we're stuck with it for now).

The way we made it work on 5.7 was to create a query with 60 joins and then another with 5 joins and then to join the results from each together.

Since the upgrade, the first query (with 60 joins) no longer runs. Instead we get "error code 2013: lost connection to MySQL server during query". This error happens immediately (0.2s) after running the query. Other queries seem to run fine.

We've tried the usual stuff that's recommended for this error

  • increase connection timeout
  • increase max allowed packet size
  • increase net read timeout
  • increase innodb buffer pool size
  • running mysql upgrade

But no luck.

We do have a work around which is to reduce the number of joins. So instead of having 60 joins and 5 joins, I have 40 joins and 25 joins. In that case I can make both queries run fine. Interestingly a 50/15 split of joins still doesn't work.

But the workaround doesn't help me to understand what's going wrong, because the the 60 join query should still work, even if it's an ugly query. So I'm nervous about carrying out the upgrade on our live server, unless I can be sure that this issue isn't indicative of some bigger problem.

The problem query is here https://www.codepile.net/pile/nN4XG6N4

EDIT I just noticed something else interesting. On the server where we upgraded to MySQL8.0 we have a slightly different configuration

table_open_cache = 2000
open_files_limit = 1048576

But on the MySQL5.7 server (where the query runs without errors) the configuration is

table_open_cache = 3495
open_files_limit = 10000

I picked this up because of some warnings in the error log at run time

2021-12-09T12:35:11.722898Z 0 [Warning] [MY-010140] [Server] Could not increase number of max_open_files to more than 10000 (r$ 2021-12-09T12:35:11.722915Z 0 [Warning] [MY-010142] [Server] Changed limits: table_open_cache: 3495 (requested 4000)

Does that help provide any more info?

ANOTHER EDIT

That wasn't it, we increased those settings to match. We stopped getting the warnings, but it didn't fix the problem.

  • just to be sure you have been here 'The maximum number of tables that can be referenced in a single join is 61. This includes a join handled by merging derived tables and views in the FROM clause into the outer query block' https://dev.mysql.com/doc/refman/8.0/en/join.html – P.Salmon Dec 07 '21 at 12:22
  • Hey, thanks for the suggestion P.Salmon, but I don't think that's the problem. The subquery with only 60 tables in it, and also the one with only 50, doesn't run. Even before I try joining it with the second part of the query to make it up to 65 tables. – Adam Baylis Dec 08 '21 at 13:32
  • you should make the settings for `table_open_cache` the same, or maybe even set `table_open_cache` to 4000 as the log suggests. – Luuk Dec 10 '21 at 10:37
  • Thanks Luuk, sorry I should have mentioned that in my post. We did set it to 4000. It didn't help sadly. – Adam Baylis Dec 11 '21 at 11:35

1 Answers1

0

So we found the solution!

We updated the following, and now it works.

ubuntu: from 16.04 to 20.04.

Mysql: 8.0.25@ubuntu16 to 8.0.27@ubuntu20.

PHP: Php7.0@ubuntu16 to Php@ubuntu20

  • The PHP update should not make a difference – Martin Dec 10 '21 at 16:36
  • Thanks Martin, yes I'm sure you're right. We think that most likely it was the MySQL upgrade that made the difference. Possibly it was a bug in 8.0.25. or perhaps something in the old version of Ubuntu that want compatible with newer versions of MySQL. – Adam Baylis Dec 11 '21 at 11:37