1

I have posts and websites (and connecting post_websites). Each post can be on multiple websites, and some websites share the content, so I am trying to access the posts which are attached to particular website IDs.

Most of the cases WHERE IN works fine, but not for all websites, some of them are laggy, and I can't understand a difference.

SELECT *
FROM `posts`
WHERE `posts`.`id` IN (
    SELECT `post_websites`.`post_id`
    FROM `post_websites`
    WHERE `website_id` IN (
        12054,
        19829,
        2258,
        253
    )
) AND
    `status` = 1 AND
    `posts`.`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6

Explain

select_type table type key key_len ref rows Extra
SIMPLE post_websites range post_websites_website_id_index 4 NULL 440 Using index condition; Using temporary; Using filesort; Start temporary
SIMPLE posts eq_ref PRIMARY 4 post_websites.post_id 1 Using where; End temporary

Other version with EXISTS

SELECT *
FROM `posts`
WHERE EXISTS (
    SELECT `post_websites`.`post_id`
    FROM `post_websites`
    WHERE `website_id` IN (
        12054,
        19829,
        2258,
        253
    ) AND
    `posts`.`id` = `post_websites`.`post_id`
) AND
    `status` = 1 AND
    `deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6

EXPLAIN:

select_type table type key key_len ref rows Extra
PRIMARY posts index post_date_index 5 NULL 12 Using where
DEPENDENT SUBQUERY post_websites ref post_id_website_id_unique 4 post.id 1 Using where; Using index

Long story short: based on different amounts of posts on each site and amount of websites sharing content the results are different from 20ms to 50s!

Based on the EXPLAIN the EXISTS works better, but on practice when the amount of data in subquery is lower, it can be very slow.

Is there a query I am missing that could work like a charm for all cases? Or should I check something before querying and choose the method of doing so dynamically?

migrations:


CREATE TABLE `posts` (
  `id` int(10) UNSIGNED NOT NULL,
  `title` varchar(225) COLLATE utf8_unicode_ci NOT NULL,
  `description` varchar(500) COLLATE utf8_unicode_ci NOT NULL,
  `post_date` timestamp NULL DEFAULT NULL,
  `status` tinyint(4) NOT NULL DEFAULT '1',
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  `deleted_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

ALTER TABLE `posts`
  ADD PRIMARY KEY (`id`),
  ADD KEY `created_at_index` (`created_at`) USING BTREE,
  ADD KEY `status_deleted_at_index` (`status`,`deleted_at`) USING BTREE,
  ADD KEY `post_date_index` (`post_date`) USING BTREE,
  ADD KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;


CREATE TABLE `post_websites` (
  `post_id` int(10) UNSIGNED NOT NULL,
  `website_id` int(10) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


ALTER TABLE `post_websites`
  ADD PRIMARY KEY (`website_id`, `post_id`),
  ADD UNIQUE KEY `post_id_website_id_unique` (`post_id`,`website_id`),
  ADD KEY `website_id_index` (`website_id`),
  ADD KEY `post_id_index` (`post_id`);

eloquent:

$news = Post::select(['title', 'description'])
            ->where('status', 1)
            ->whereExists(
                function ($query) use ($sites) {
                    $query->select('post_websites.post_id')
                        ->from('post_websites')
                        ->whereIn('websites_id', $sites)
                        ->whereRaw('post_websites.post_id = posts.id');
            })
            ->orderBy('post_date', 'desc');
            ->limit(6)
            ->get();

or 
$q->whereIn('posts.id',
                function ($query) use ($sites) {
                    $query->select('post_websites.post_id')
                        ->from('post_websites')
                        ->whereIn('website_id', $sites);
                });

Thanks.

Jack Bo
  • 61
  • 1
  • 7
  • try an inner join with big IN or EXISTS – nbk Sep 05 '21 at 20:04
  • is there a universal solution for big or small data in subquery? I don't know how big is the data unless I query that separately first. – Jack Bo Sep 05 '21 at 20:21
  • INNEr JOIN is faster with big data, with smal numbers you get almost te same, but mysqlget diffrent ways. itry it – nbk Sep 05 '21 at 20:35
  • just tried, same story, some websites are flying, for some this query goes for 4 secods. – Jack Bo Sep 05 '21 at 21:28
  • Have you tried the query using equivalent Eloquent relationships and query builder methods to see if you can accomplish this more easily and effectively that way? – aethergy Sep 05 '21 at 21:32
  • yes it's all encoded in eloquent methods, just showing the mysql problem on resulted query. – Jack Bo Sep 05 '21 at 21:43
  • @JackBo can you show us both Eloquent queries instead of SQL queries so it is more readable ? I had to work with reports and before upgrading our eloquent queries we were having like `4s` latency response, and after fixing some performant stuff (like yours) we went down to `100ms` without the need of a special query for a special case or anything like that... Also, can you share your migrations for these tables ? I think you are not `indexing` some IDs. – matiaslauriti Sep 05 '21 at 22:54
  • @matiaslauriti added migrations and Eloquent code. I do see what this answer mean https://stackoverflow.com/questions/20172487/why-sql-with-exists-run-slower-than-in-using-mysql. My EXISTS works when there a lot of overlapping, sites share content and posts share them, for example I have blog posted on site A and B, and at the same time site A and B selected to share content , if there are a lot of these, then WHERE IN choke, but EXISTS performs well. But when there are not a lot of content and it's not overlapping then WHERE IN is much better. EXISTS more than a minute sometimes... – Jack Bo Sep 05 '21 at 23:28
  • Actually not a lot of overlapping also lags for WHERE IN, 5929 results , only 2 duplicate and it process 6 seconds. – Jack Bo Sep 05 '21 at 23:43
  • @JackBo why are you literally using a query instead of relations to accomplish this ? Why are you not using [`whereHas`](https://laravel.com/docs/8.x/eloquent-relationships#querying-relationship-existence) ? Also, could you share the real migration (`PHP`) file ? Plain SQL is it not as readable as a PHP migration. Could you replace that? – matiaslauriti Sep 05 '21 at 23:43
  • @matiaslauriti We found it is faster because post_websites doesn't have model it's a pivot, only POST and WEBSITE has models, when you do whereHas('website'), i believe it also gets info from website table, which we don't need in this case. – Jack Bo Sep 05 '21 at 23:46
  • @JackBo I can't remember 100% right now what will happen, but if you have a relation `many-to-many` it will ask the pivot table (you do not need to have a model for the pivot), so my personal recommendation would be to either try `whereHas` or [`whereRelation`](https://laravel.com/docs/8.x/eloquent-relationships#inline-relationship-existence-queries) (if this last one still works for your conditions) and see if it works better. Also, remember that you can see the final result using `->toSql()` at the end instead of `->get()` so you can see the real SQL run. Can you try that ? – matiaslauriti Sep 05 '21 at 23:50
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/236788/discussion-between-jack-bo-and-matiaslauriti). – Jack Bo Sep 05 '21 at 23:54

1 Answers1

0

Many:many table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table That says to get rid if id (because it slows things down), promote that UNIQUE to be the PK, and add an INDEX in the opposite direction.

Don't use IN ( SELECT ... ). A simple JOIN is probably the best alternative here.

Did some 3rd party package provide those 3 TIMESTAMPs for each table? Are they ever used? Get rid of them.

KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;

is mostly backward. Some rules:

  • Don't start an index with the PRIMARY KEY column(s).
  • Do start an index with = tests: status,deleted_at
Rick James
  • 135,179
  • 13
  • 127
  • 222