We recently upgraded our MySQL database from 5.7 to 8.0 and changed all the collations from latin to utf8 mb4.
Mostly this went very well and it fixed the problem we were trying to solve. But it created another problem that we just don't understand.
We have a very long query in the database that requires 65 Joins (Don't ask, I know it's horrible, we're in the process of engineering it out, but we're stuck with it for now).
The way we made it work on 5.7 was to create a query with 60 joins and then another with 5 joins and then to join the results from each together.
Since the upgrade, the first query (with 60 joins) no longer runs. Instead we get "error code 2013: lost connection to MySQL server during query". This error happens immediately (0.2s) after running the query. Other queries seem to run fine.
We've tried the usual stuff that's recommended for this error
- increase connection timeout
- increase max allowed packet size
- increase net read timeout
- increase innodb buffer pool size
- running mysql upgrade
But no luck.
We do have a work around which is to reduce the number of joins. So instead of having 60 joins and 5 joins, I have 40 joins and 25 joins. In that case I can make both queries run fine. Interestingly a 50/15 split of joins still doesn't work.
But the workaround doesn't help me to understand what's going wrong, because the the 60 join query should still work, even if it's an ugly query. So I'm nervous about carrying out the upgrade on our live server, unless I can be sure that this issue isn't indicative of some bigger problem.
The problem query is here https://www.codepile.net/pile/nN4XG6N4
EDIT I just noticed something else interesting. On the server where we upgraded to MySQL8.0 we have a slightly different configuration
table_open_cache = 2000
open_files_limit = 1048576
But on the MySQL5.7 server (where the query runs without errors) the configuration is
table_open_cache = 3495
open_files_limit = 10000
I picked this up because of some warnings in the error log at run time
2021-12-09T12:35:11.722898Z 0 [Warning] [MY-010140] [Server] Could not increase number of max_open_files to more than 10000 (r$ 2021-12-09T12:35:11.722915Z 0 [Warning] [MY-010142] [Server] Changed limits: table_open_cache: 3495 (requested 4000)
Does that help provide any more info?
ANOTHER EDIT
That wasn't it, we increased those settings to match. We stopped getting the warnings, but it didn't fix the problem.