3

I getting in doubt with "IS NULL" MySQL check. I have this 2 queries. The first one runs in about 300 seconds. The second one run less then 1 second!

Slow query:

SELECT count(distinct(u.id))
FROM ips_usuario AS u 
JOIN ips_fatura AS f
    ON ((u.id = f.ips_usuario_id) OR
       (u.ips_usuario_id_titular IS NOT NULL AND
        u.ips_usuario_id_titular = f.ips_usuario_id));

enter image description here

Fast query:

SELECT count(distinct(u.id))
FROM ips_usuario AS u 
JOIN ips_fatura AS f
    ON ((u.id = f.ips_usuario_id) OR
       (u.ips_usuario_id_titular = f.ips_usuario_id));

enter image description here

All join conditions use foreign keys indexed columns. The table ips_usuario have about 20.000 records and the table ips_fatura have about 500.000 records.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
Arivan Bastos
  • 1,876
  • 1
  • 20
  • 28

1 Answers1

1

I am surprised that either is fast. I would suggest replacing them with exists:

SELECT COUNT(*)
FROM ips_usuario u  
WHERE EXISTS (SELECT 1 FROM ips_fatura f WHERE u.id = f.ips_usuario_id) OR
      EXISTS (SELECT 1 FROM ips_fatura f WHERE u.ips_usuario_id_titular = f.ips_usuario_id);

And for the second:

SELECT COUNT(*)
FROM ips_usuario u  
WHERE EXISTS (SELECT 1 FROM ips_fatura f WHERE u.id = f.ips_usuario_id) OR
      (u.ips_usuario_id_titular IS NOT NULL AND
       EXISTS (SELECT 1 FROM ips_fatura f WHERE u.ips_usuario_id_titular = f.ips_usuario_id)
      )

For both these, you want two indexes: ips_fatura(ips_usuario_id) and ips_fatura(ips_usuario_id_titular). You can check the explain to be sure that EXISTS is using the index. If not, the newer releases of MySQL use indexes for IN:

SELECT COUNT(*)
FROM ips_usuario u  
WHERE u.id IN (SELECT f.ips_usuario_id FROM ips_fatura f) OR
      u.ips_usuario_id_titular IN (SELECT f.ips_usuario_id FROM ips_fatura f);

In either case (EXISTS or IN) the goal is to do a "semi-join". That is, to only fine the first row with a match rather than all matches. This is an important efficiency, because it allows the query to avoid duplication removal.

I would speculate that the issue is the optimization of the or -- usually this results in inefficient JOIN algorithms. However, perhaps MySQL is smart in your first case. But the addition of the IS NULL to the outer table throws it off.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Strange thing (for me at least) is that "fast" query (with "is null") doesn't use any index for second table, but checking same number of rows and work 300 times faster. – Arnial Aug 10 '16 at 01:32
  • Why not use a left join? Exists queries usually perform quite poorly. – TaylorN Aug 10 '16 at 01:39
  • @TaylorN . . . Why would you use a `LEFT JOIN`? First, that would keep everything in the first table. Second, it might generate duplicates, creating the need for duplicate removal. – Gordon Linoff Aug 10 '16 at 01:41
  • I misread the query slightly. However I still think that a LEFT JOIN in combination with DISTINCT would perform better, in my experience anyway. – TaylorN Aug 10 '16 at 01:55