0

I have MySQL database and 5 tables called tribes(groups), posts, posts_to_groups, post_comments and posts_votes.

Relationship between groups and posts is MANY_2_MANY so each post can belong to many groups and each group can contain 0-* posts. This is what table posts_to_groups does.

I'm searching for the 3 most popular posts that were posted into the groups that this user follows(associated via posts_to_tribes - table for MANY_2_MANY relationship) for the last 24 hours from this moment and ordered by sum of (comments_count + votes_count) DESC

This is my current query:

SELECT DISTINCT
    p.post_id,
    p.description,
    p.link,
    p.user_id,
    p.total_comments,
    p.total_votes,
    (SELECT 
            COUNT(*)
        FROM
            comments
        WHERE
            last_edited > DATE_SUB(NOW(), INTERVAL 24 HOUR)
                AND post_id = p.post_id) AS comments_count,
    (SELECT 
            COUNT(*)
        FROM
            posts_votes
        WHERE
            date_voted > DATE_SUB(NOW(), INTERVAL 24 HOUR)
                AND post_id = p.post_id) AS votes_count
FROM
    posts p
        JOIN
    posts_to_tribes pt ON pt.post_id = p.post_id
WHERE
    pt.tribe_id IN (3 , 38, 107)
ORDER BY (comments_count + votes_count) DESC , p.last_edited DESC
LIMIT 3;

This query is extremely slow and took right now ~500ms.

Is any way to rewrite this query in order to improve performance ?

UPDATED:

EXPLAIN output:

enter image description here

Query suggested by Tim3880:

SELECT 
    p.post_id,
    p.description,
    p.link,
    p.user_id,
    p.total_comments,
    p.total_votes,
    t.comments_count,
    t.votes_count
FROM posts p
JOIN (
    SELECT 
        p.post_id,
        (SELECT 
                COUNT(*)
            FROM
                comments
            WHERE
                last_edited > DATE_SUB(NOW(), INTERVAL 24 HOUR)
                    AND post_id = p.post_id) AS comments_count,
        (SELECT 
                COUNT(*)
            FROM
                posts_votes
            WHERE
                date_voted > DATE_SUB(NOW(), INTERVAL 24 HOUR)
                    AND post_id = p.post_id) AS votes_count
    FROM
        posts p
            JOIN
        posts_to_tribes pt ON pt.post_id = p.post_id
    WHERE
        pt.tribe_id IN (3 , 38, 107)
    ORDER BY (comments_count + votes_count) DESC , p.last_edited DESC
    LIMIT 3
) t
ON p.post_id = t.post_id
ORDER BY (t.comments_count + t.votes_count) DESC , p.last_edited DESC

It took now ~280ms.

EXPLAIN output:

enter image description here

Cœur
  • 37,241
  • 25
  • 195
  • 267
alexanoid
  • 24,051
  • 54
  • 210
  • 410
  • 1
    Please add the result of a `EXPLAIN SELECT...` from your query together with your table definition. – Bjoern Jun 11 '15 at 19:06
  • What is the `EXPLAIN` output? Do you have indexes on the columns you are using in the `WHERE` conditions? – jmdeldin Jun 11 '15 at 19:07
  • Thanks for your comments ! Unfortunately I have no access to this db right now, I'll provide this information tomorrow. – alexanoid Jun 11 '15 at 19:19
  • Without execution plan, we can only guess. You can try rewrite your sub queries using group by post id and then join them for the outer query. – Tim3880 Jun 11 '15 at 19:20
  • 500 ms does not seem particularly slow, particularly if the underlying tables have a lot of data. What is your expectation. – Gordon Linoff Jun 11 '15 at 19:52
  • we have a lot of different queries(inside of REST endpoints) and only this one under performance testing shows ~16 sec(in concurrent environment) while the other endpoints show ~2 or 3 sec – alexanoid Jun 11 '15 at 20:56
  • I have added EXPLAIN output – alexanoid Jun 12 '15 at 09:20

2 Answers2

1

If you post_id is a primary key (or unique), try get the 3 post_id first:

SELECT 
    p.post_id,
    p.description,
    p.link,
    p.user_id,
    p.total_comments,
    p.total_votes,
    t.comments_count,
    t.votes_count
FROM posts p 
JOIN (
    SELECT 
        p.post_id,
        (SELECT 
                COUNT(*)
            FROM
                comments
            WHERE
                last_edited > DATE_SUB(NOW(), INTERVAL 24 HOUR)
                    AND post_id = p.post_id) AS comments_count,
        (SELECT 
                COUNT(*)
            FROM
                posts_votes
            WHERE
                date_voted > DATE_SUB(NOW(), INTERVAL 24 HOUR)
                    AND post_id = p.post_id) AS votes_count
    FROM
        posts p
            JOIN
        posts_to_tribes pt ON pt.post_id = p.post_id
    WHERE
        pt.tribe_id IN (3 , 38, 107)
        AND p.last_edited >  DATE_SUB(NOW(), INTERVAL 24 HOUR)
    ORDER BY (comments_count + votes_count) DESC , p.last_edited DESC
    LIMIT 3
) t
ON p.post_id = t.post_id
ORDER BY (t.comments_count + t.votes_count) DESC , p.last_edited DESC

Edit: This is the join version:

SELECT 
    p.post_id,
    p.description,
    p.link,
    p.user_id,
    p.total_comments,
    p.total_votes,
    t.comments_count,
    t.votes_count
FROM posts p 
JOIN (
    SELECT 
        p.post_id,Comments_Count, Votes_Count
    FROM
        posts p
        JOIN
        posts_to_tribes pt ON pt.post_id = p.post_id
        LEFT JOIN (SELECT 
                post_id, COUNT(*) Comments_Count
            FROM
                comments
            WHERE
                last_edited > DATE_SUB(NOW(), INTERVAL 24 HOUR)
            GROUP BY post_id) cc
        ON p.post_id = cc.post_id
        LEFT JOIN 
        ( 
            SELECT 
                post_id, COUNT(*) Votes_Count
            FROM
                posts_votes
            WHERE
                date_voted > DATE_SUB(NOW(), INTERVAL 24 HOUR)
            GROUP BY post_id
        ) vc
        ON p.post_id = vc.post_id
        WHERE pt.tribe_id IN (3 , 38, 107)
    ORDER BY (comments_count + votes_count) DESC , p.last_edited DESC
    LIMIT 3
) t
ON p.post_id = t.post_id
ORDER BY (t.comments_count + t.votes_count) DESC , p.last_edited DESC

If the performance is still not acceptable, you may have to think about updating the total_comments, total_votes directly or using trigger or scheduled job.

Tim3880
  • 2,563
  • 1
  • 11
  • 14
  • Thanks, this is the fastest query.. but it shows the other output results than the original one.. – alexanoid Jun 15 '15 at 08:00
  • It works 2 times faster than the original query.. but the speed is still not perfect. May be correlated subqueries here can also be optimized ? – alexanoid Jun 15 '15 at 10:43
  • Your explain output didn't indicate any sub query issues. Please post your new explain result. – Tim3880 Jun 15 '15 at 12:15
  • Added explain output for your query – alexanoid Jun 15 '15 at 12:24
  • As I said, the plan didn't show any sub query issues. group by may make it worse. – Tim3880 Jun 15 '15 at 16:10
  • Also, I already use total_comments and total_votes but for total comments/votes number for all time.. but inside of this query I need to find posts based on some sliding interval (now - x). This is why I need to calculate this on the fly with subselects or something similar.. – alexanoid Jun 15 '15 at 18:28
  • I don't have the data to play around. If I have to dig further, I would try 1) add the time filter to the posts 2) try different composite indexes. – Tim3880 Jun 15 '15 at 18:49
  • 1
    I added the line " AND p.last_edited > DATE_SUB(NOW(), INTERVAL 24 HOUR)". On my testing environment, it makes huge difference. – Tim3880 Jun 15 '15 at 19:45
  • Thanks Tim3880! Right now this query runs ~156ms. Tomorrow I'll run my performance tests and report a results here. – alexanoid Jun 15 '15 at 19:59
  • The last query shows the best results. Thanks Tim3880! – alexanoid Jun 16 '15 at 09:16
0

You have used 2 correlated subqueries. Each one from the correlated subqueries will be executed once for every row from the outer query. Hence, if you can avoid them it's likely to get faster query.

[..] they are inefficient and likely to be slow. Rewriting the query as a join might improve performance.

You have to avoid them by using join. This may help you: MySQL - can I avoid these correlated / dependant subqueries?

Community
  • 1
  • 1
54l3d
  • 3,913
  • 4
  • 32
  • 58