2

I'm creating a custom forum from scratch and I'm attempting to use some LEFT JOIN queries to get information such as total posts, total threads and most recent thread. I've managed to get the data but the recent thread keeps returning a random value rather than the most recent thread.

CREATE TABLE forum_categories
    (`name` varchar(18), `label` varchar(52), `id` int)
;
    
INSERT INTO forum_categories
    (`name`, `label`, `id`)
VALUES
    ('General Discussion', 'Talk about anything and everything Digimon!', 1),
    ('Deck Discussion', 'Talk about Digimon TCG Decks and Strategies!', 2),
    ('Card Discussion', 'Talk about Digimon TCG Cards!', 3),
    ('Website Feedback', 'A place to discuss and offer feedback on the website', 4)
;

CREATE TABLE forum_topics
    (`name` varchar(18), `id` int, `parent_id` int, `author_id` int, date date)
;
    
INSERT INTO forum_topics
    (`name`, `id`, `parent_id`, `author_id`, `date`)
VALUES
    ('My First Topic', 1, 1, 16, '2021-03-29'),
    ('My Second Topic', 2, 1, 16, '2021-03-30')
;

CREATE TABLE forum_topics_content
    (`id` int, `topic_id` int, `author_id` int, date datetime, `content` varchar(300))
;
    
INSERT INTO forum_topics_content
    (`id`, `topic_id`, `author_id`, `date`, `content`)
VALUES
    (1, 1, 16, '2021-03-29 15:46:55', 'Hey guys! This is my first post!'),
    (2, 1, 16, '2021-03-30 08:05:13', 'This is my first topic reply!')
;

My Query:

SELECT forum_categories.name, label, forum_categories.id, COUNT(DISTINCT(forum_topics.id)) as 'topics', COUNT(DISTINCT(forum_topics_content.id)) as 'posts', SUBSTRING(forum_topics.name,1, 32) as 'thread'
FROM forum_categories 
LEFT JOIN forum_topics ON forum_categories.id = forum_topics.parent_id
LEFT JOIN forum_topics_content ON forum_topics.id = forum_topics_content.topic_id
GROUP BY forum_categories.id
ORDER BY forum_categories.id, forum_topics.date DESC

I figured having an ORDER BY of forum_topics.date DESC would work for me and output the most recent thread which is "My Second Topic" but it doesn't. I'm a bit stumped and have tried different variations of ORDER BY to no avail. thread keeps returning a random result from the two possible results.

Full example with data is available on this fiddle: https://www.db-fiddle.com/f/auDzUABaEpYzLKDkRqE7ok/0

Desired result would 'thread' always being the latest thread which in this example is "My Second Topic". However it always seems to randomly pick between "My First Topic" and "My Second Topic".

The output for the first row should always be:

'General Discussion' , 'Talk about anything and everything Digimon!' 1, 2, 2, 'My Second Topic'
Rick James
  • 135,179
  • 13
  • 127
  • 222
GenesisBits
  • 364
  • 2
  • 23

2 Answers2

1

In you fiddle you have:

SET SESSION sql_mode = '';

You should change that to:

SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY';

You will get an error like this:

Query Error: Error: ER_WRONG_FIELD_WITH_GROUP: Expression #1 of 
SELECT list is not in GROUP BY clause and contains nonaggregated 
column 'test.forum_categories.name' which is not functionally 
dependent on columns in GROUP BY clause; this is incompatible with
 sql_mode=only_full_group_by

It is stated in the docs that:

ONLY_FULL_GROUP_BY

Reject queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on (uniquely determined by) GROUP BY columns.

As of MySQL 5.7.5, the default SQL mode includes ONLY_FULL_GROUP_BY. (Before 5.7.5, MySQL does not detect functional dependency and ONLY_FULL_GROUP_BY is not enabled by default. For a description of pre-5.7.5 behavior, see the MySQL 5.6 Reference Manual.)

And there is a very good reason why they did that. see: why should not disable only full group by

Luuk
  • 12,245
  • 5
  • 22
  • 33
1

thread keeps returning a random result from the two possible results.

Provided query is simply undeterministic and equivalent to:

SELECT forum_categories.name, 
  forum_categories.label, 
  forum_categories.id,
  COUNT(DISTINCT(forum_topics.id)) as 'topics',
  COUNT(DISTINCT(forum_topics_content.id)) as 'posts',
  SUBSTRING(ANY_VALUE(forum_topics.name),1, 32) as 'thread'
FROM forum_categories 
LEFT JOIN forum_topics ON forum_categories.id = forum_topics.parent_id
LEFT JOIN forum_topics_content ON forum_topics.id = forum_topics_content.topic_id
GROUP BY forum_categories.id,forum_categories.name,forum_categories.label
ORDER BY forum_categories.id, ANY_VALUE(forum_topics.date) DESC;

Assuming that forum_categories.id is PRIMARY KEY, the name/label are functionally dependent but rest of the column is simply ANY_VALUE.

If a column in SELECT list is not functionally dependent or wrapped with aggregate function the query is incorrect. On MySQL 8.0 or when ONLY_FULL_GROUP_BY is enabled the result is error.

Related: Group by clause in mySQL and postgreSQL, why the error in postgreSQL?


There are different ways to achieve desired result(correlated subqueries, windowed functions, limit) and so on.

Here using GROUP_CONCAT:

SELECT forum_categories.name, 
  forum_categories.label, 
  forum_categories.id,
  COUNT(DISTINCT(forum_topics.id)) as `topics`,
  COUNT(DISTINCT(forum_topics_content.id)) as `posts`,
  SUBSTRING_INDEX(GROUP_CONCAT(SUBSTRING(forum_topics.name,1,32)
                 ORDER BY forum_topics.`date` DESC 
                 SEPARATOR '~'),
                 '~',1) AS `thread`
FROM forum_categories 
LEFT JOIN forum_topics ON forum_categories.id = forum_topics.parent_id
LEFT JOIN forum_topics_content ON forum_topics.id = forum_topics_content.topic_id
GROUP BY forum_categories.id,forum_categories.name,forum_categories.label
ORDER BY forum_categories.id;

How it works:

GROUP_CONCAT is aggregate function that allow to concatenate string preserving order.

My Second Topic~My First Topic~My First Topic

Then SUBSTRING_INDEX returns part of string up to first occurence of delimeter ~.

db<>fiddle demo

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275