3

Database: version 8.0.26-17 https://www.percona.com/doc/percona-server/8.0/release-notes/Percona-Server-8.0.26-17.html

I have two queries that yield different results. I don't understand why.

1)

select eev_company_id,
count(distinct maj.dsd_prefix) as maj_cnt,
count(distinct min.dsd_prefix) as min_cnt
from ehev_most_recent as eev
inner join ekohubschema as ehs on  ehs.ehs_subcategory = eev.eev_subcategory
left join datasourcedescription as maj on maj.dsd_prefix = eev.eev_prefix and maj.dsd_type_id = 'MAJ'
left join datasourcedescription as min on min.dsd_prefix = eev.eev_prefix and min.dsd_type_id <> 'MAJ'
where ehs.ehs_category <> 'Exclusionary Factors'
group by eev.eev_company_id
having eev.eev_company_id = 'ADD53604';

result is:

+----------------+---------+---------+
| eev_company_id | maj_cnt | min_cnt |
+----------------+---------+---------+
| ADD53604       |       2 |       1 |
+----------------+---------+---------+

The second query is pretty much the same but substituted group by eev_company_id having with AND:

2)

select 
count(distinct maj.dsd_prefix) as maj_cnt,
count(distinct min.dsd_prefix) as min_cnt
from ehev_most_recent as eev
inner join ekohubschema as ehs on ehs.ehs_subcategory = eev.eev_subcategory
left join datasourcedescription as maj on maj.dsd_prefix = eev.eev_prefix and maj.dsd_type_id = 'MAJ'
left join datasourcedescription as min on min.dsd_prefix = eev.eev_prefix and min.dsd_type_id <> 'MAJ'
where ehs.ehs_category <> 'Exclusionary Factors' AND eev.eev_company_id = 'ADD53604';

This query results in:

+---------+---------+
| maj_cnt | min_cnt |
+---------+---------+
|       2 |       0 |
+---------+---------+

As you can see, the min_cnt here is 0 while for the first query it is 1. What is the reason for the difference?

If I remove ekohubschema join I get the same results: 3)

select eev_company_id,
count(distinct maj.dsd_prefix) as maj_cnt,
count(distinct min.dsd_prefix) as min_cnt
from ehev_most_recent as eev
left join datasourcedescription as maj on maj.dsd_prefix = eev.eev_prefix and maj.dsd_type_id = 'MAJ'
left join datasourcedescription as min on min.dsd_prefix = eev.eev_prefix and min.dsd_type_id <> 'MAJ'
group by eev.eev_company_id
having eev.eev_company_id = 'ADD53604'; 

+----------------+---------+---------+
| eev_company_id | maj_cnt | min_cnt |
+----------------+---------+---------+
| ADD53604       |       2 |       0 |
+----------------+---------+---------+

ekohubschema table has the following columns: ehs_category, ehs_subcategory and ehs_long_description, no company ID whatsoever, and yet it interferes with the result.

I don't see any minor datasources, only major. This is why I struggle to find out where the count 1 (for min_cnt ) comes from.

enter image description here

halfer
  • 19,824
  • 17
  • 99
  • 186
Hairi
  • 3,318
  • 2
  • 29
  • 68
  • The second query is invalid and expected to raise an error. Which dbms are you using? – jarlh Feb 22 '22 at 14:15
  • Please give tables an alias, and use them, so we know which column is in which table – HoneyBadger Feb 22 '22 at 14:16
  • 1
    You have already got the answer... The second query is invalid, but instead of raising an error your dbms returns an arbitrary result. – jarlh Feb 22 '22 at 14:21
  • @jarlh Thank you but I don't understand what's wrong with the query :(. Could you please help? Im using MySQL 8.0.26-17 – Hairi Feb 22 '22 at 14:26
  • Since you mix both a (direct) column reference _and_ aggregate functions in your SELECT clause, a GROUP BY is required. – jarlh Feb 22 '22 at 14:28
  • @jarlh I removed `eev_company_id,` from the select statement still the same result – Hairi Feb 22 '22 at 14:31
  • @jarlh Changed the original comment – Hairi Feb 22 '22 at 14:33
  • MySQL, start with `SET sql_mode = 'ONLY_FULL_GROUP_BY';` – jarlh Feb 22 '22 at 14:35
  • @jarlh thank you for the suggestion but how long this takes effect. I need to try on production db and I dont want to change sql_mode for long time – Rikotech Feb 22 '22 at 14:37
  • Sorry, but I'm not that familiar with MySQL. Check the documentation. Or use a test database. – jarlh Feb 22 '22 at 14:38
  • @jarlh I set this option, but the result remains the same: I understand the numbers `maj_cnt=2` , `min_cnt: 0` because I see such data in the tables, but it is really strange to me to this -> `min_cnt: 1`. I don't know where this count of 1 comes if all the `dsd_type_id` for this companies are 'MAJ' – Hairi Feb 22 '22 at 15:01
  • `HAVING` is applied after `GROUP BY`, that's why you see a `1`, see: https://stackoverflow.com/questions/6457523/mysql-group-by-and-having – Luuk Feb 22 '22 at 15:33
  • @Luuk Still I don't understand why. The grouping is performed on companyID and the having clause also selects companyID. – Hairi Feb 22 '22 at 15:39
  • I tried to reproduce this using 8.0.26, but I get counts of 2 and 0. I suggest you try to make a minimal demo in [dbfiddle](https://dbfiddle.uk/). – Bill Karwin Feb 22 '22 at 17:22
  • 2
    I did, and I think it's a bug, see: [DBFIDDLE](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=75a221b6ea180308aeacae83c107d5cd). I reported it here: http://bugs.mysql.com/106539 – Luuk Feb 22 '22 at 17:56
  • 1
    Thank you @Luuk it seems like a bug to me as well. BillKarwin the problem is that ehev_most_recent table is huge 28M+ records. I tried to reproduce only for one company_id where I spot a problem but when I filter the table to only that company I start to get correct results |2|0| . – Hairi Feb 22 '22 at 18:33

1 Answers1

2

Please check the output of this query:

select 
   eev_company_id, 
   min.dsd_prefix
from ehev_most_recent as eev
left join datasourcedescription as min on min.dsd_prefix = eev.eev_prefix and min.dsd_type_id <> 'MAJ'
where eev.eev_company_id = 'ADD53604'; 

I think it (the output) contains at least 1 time a 1, if not it's a bug.

I did, and I think it's a bug, see: DBFIDDLE. I reported it here: bug 106539

The bug also exists in MariaDB 10.6, see: DBFIDDLE

Luuk
  • 12,245
  • 5
  • 22
  • 33
  • The result is 245 rows - all identical `| ADD53604 | NULL |` – Hairi Feb 22 '22 at 15:44
  • Here is the result https://gist.github.com/RadoslavMarinov/868fe96131ddee4e24d79a3155a0a7cc – Hairi Feb 22 '22 at 15:54
  • I got a response on my bug report, that it is not a bug.. (see the bug report for the details) – Luuk Mar 07 '22 at 10:06
  • A [DBFIDDLE](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=dccf94d8d2d6c0e1f3c5fa103be1eddf) which (hopefully) explains why my attempt to reproduce is wrong – Luuk Mar 07 '22 at 10:16