I switched an app from Laravel 5.1 to Laravel 5.8 by setting up a fresh 5.8 project and copying over the files, making some adjustments here and there.
The issue is that the queries with whereHas have become extremely slow.
Here is an example code:
Article::whereHas('categories', function ($category) {
$category->where('link', 'foto');
})
->active()
->recent()
->take(3)
->get();
This code generates the following query on Laravel 5.1 and completes in 0.05-0.07 seconds.
SELECT *
FROM `articles`
WHERE `articles`.`deleted_at` IS NULL
AND
(SELECT count(*)
FROM `categories`
INNER JOIN `article_category`
ON `categories`.`id` = `article_category`.`category_id`
WHERE `article_category`.`article_id` = `articles`.`id`
AND `link` = 'foto'
AND `categories`.`deleted_at` IS NULL) >= 1
ORDER BY IFNULL(published_at, created_at) DESC
LIMIT 3
and here's the explain:
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
| 1 | PRIMARY | articles | ALL | NULL | NULL | NULL | NULL | 4846 | 100.00 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | categories | ref | PRIMARY,categories_link_index | categories_link_index | 767 | const | 1 | 100.00 | Using index condition; Using where |
| 2 | DEPENDENT SUBQUERY | article_category | ref | article_category_category_id_foreign,article_category_article_id_foreign | article_category_article_id_foreign | 4 | lcf.articles.id | 1 | 100.00 | Using where |
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
While on Laravel 5.8 it generates the following query that runs 10-13 seconds.
SELECT *
FROM `articles`
WHERE EXISTS
(SELECT *
FROM `categories`
INNER JOIN `article_category`
ON `categories`.`id` = `article_category`.`category_id`
WHERE `articles`.`id` = `article_category`.`article_id`
AND `link` = 'foto'
AND `categories`.`deleted_at` IS NULL)
AND `articles`.`deleted_at` IS NULL
ORDER BY IFNULL(published_at, created_at) DESC
LIMIT 3
and here's the explain
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 107 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | articles | ALL | PRIMARY | NULL | NULL | NULL | 4846 | 75.01 | Using where |
| 2 | MATERIALIZED | categories | ref | PRIMARY,categories_link_index | categories_link_index | 767 | const | 1 | 100.00 | Using index condition; Using where |
| 2 | MATERIALIZED | article_category | ref | article_category_category_id_foreign,article_category_article_id_foreign | article_category_category_id_foreign | 4 | lcf.categories.id | 107 | 100.00 | |
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
I ran both codebases on the same server, same MariaDB 10.2.24 database. The dataset size is approximately 6k articles, 80 categories and 10k records in the pivot.
What should I do here? So far I have discovered a bit more than 10 queries suffering from this problem in the codebase. Can I somehow flip a switch in config and make them all check the existence using the old way? Or should I somehow instruct every query to improve their plan?
UPDATE
I just noticed that if I use whereHas(..., '>', 0)
I can get almost the old query (actually WHERE (SELECT COUNT...) > 0
) with the old performance. However, whereHas(..., '>=', 1)
does reduce itself to query with EXISTS
. A question remains whether I could switch this behaviour over whole app without editing each query.
ANSWERS TO COMMENTS
Indexes onr articles
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| articles | 0 | PRIMARY | 1 | id | A | 4846 | NULL | NULL | | BTREE | | |
| articles | 1 | articles_author_id_foreign | 1 | author_id | A | 18 | NULL | NULL | YES | BTREE | | |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Indexes on article_category
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| article_category | 0 | PRIMARY | 1 | id | A | 9676 | NULL | NULL | | BTREE | | |
| article_category | 1 | article_category_category_id_foreign | 1 | category_id | A | 90 | NULL | NULL | | BTREE | | |
| article_category | 1 | article_category_article_id_foreign | 1 | article_id | A | 9676 | NULL | NULL | | BTREE | | |
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The data to run the examples can be found here: https://gist.github.com/tontonsb/b97bc33066a67e9d8bc3654f2c01c103
This runs faster, but it's still 2.8 vs 0.07 seconds so the problem can be clearly seen, at least on MariaDB 10.2.24. Probably the speed improved because I have removed other columns and their indices.