0

I was testing performance a little bit for such case:

There is one table called categories with columns id and name, with 10000 records. There is table post with columns id, body, category_id with million rows, where each row was randomly assigned to one of those categories. categories_id in posts is a foreign key pointing to categories.id, and columns named id in both tables are primary keys. indexes are created on those columns.

Now, my goal is to get a list of categories with count of posts in each.

Why is a subquery faster than join? I used h2 and mysql although without too much performance tuning, possibly also without understanding of some matters, and both are consistent in that join is more or less slower than subquery.

Those are the queries:

using subquery:

SELECT categories.name, 
(SELECT COUNT(posts.id) FROM posts WHERE posts.categories_id = categories.id) as postCount 
FROM categories

join:

select categories.name, count(posts.id) from categories 
left join posts on posts.categories_id=categories.id 
group by categories.id
Michał Zegan
  • 456
  • 4
  • 12
  • oh, clarifications, my bad. on h2 the categories.name had unique constraint, so had an index. that is why there was not so much difference compared to mysql test that lacked index on name. when index is added or group by is changed to the id field, it however still goes like before, joins slightly but slower than subqueries – Michał Zegan Jan 17 '18 at 22:39
  • Where is the remaining code you're using... If you refer to a table.. add that table with few rows + x columns. Try to implement paragraphs with section break... now its a large block of text and doesn't help the info to land in the grey matter ;-) – ZF007 Jan 17 '18 at 22:46
  • also..if you provide additional info like in your comment... implement that into your question. NP with doing that... In fact.. it deepens the question. Enjoy SO ;-) – ZF007 Jan 17 '18 at 22:47
  • answer edited, i now group by id so that results on mysql shouldn't be so much weird – Michał Zegan Jan 17 '18 at 22:51
  • Lucky for you, I've seen this exact situation. The short version is that sub queries create indexes if needed, and that sometimes causes a sub query to be faster than a join. See the question here https://stackoverflow.com/questions/45374832/why-is-this-simple-join-query-significantly-quicker-with-a-sub-query – Goose Jan 17 '18 at 22:56
  • 1
    Possible duplicate of [Why is this simple join query significantly quicker with a sub-query?](https://stackoverflow.com/questions/45374832/why-is-this-simple-join-query-significantly-quicker-with-a-sub-query) – Goose Jan 17 '18 at 22:56
  • Much better formatting. As Goose just posted before me... check that link for more info on your topic. – ZF007 Jan 17 '18 at 22:57
  • when I have sized up the cache in h2 it suddenly started to be reversed and now join is slightly faster. at least most of the times. interesting – Michał Zegan Jan 17 '18 at 23:06
  • When grouping, I recommend to group on the lowest level possible, so before the join: SELECT categories.name, grp.cnt FROM categories JOIN (SELECT COUNT(*) as cnt, posts.categories_id FROM posts GROUP BY posts.categories_id) as grp ON grp.categories_id = categories.id – Antonín Lejsek Jan 21 '18 at 04:01

0 Answers0