3

I was having a problem with the query the other day. It took about 10 seconds for a large dataset. The query looked something like this:

SELECT a.* from Document as a
LEFT JOIN Waybill as b on a.waybill = b.id
WHERE a.enterpriseGuid = '763a3ac3-a3c7-4379-9735-2a4a96e87e5d'
OR b.enterpriseGuid = '763a3ac3-a3c7-4379-9735-2a4a96e87e5d'

This ran significantly slow. However, then I changed it to this:

SELECT a.* from Document as a
LEFT JOIN Waybill as b on a.waybill = b.id
WHERE a.enterpriseGuid = '763a3ac3-a3c7-4379-9735-2a4a96e87e5d'
UNION ALL
SELECT a.* from Document as a
LEFT JOIN Waybill as b on a.waybill = b.id
WHERE b.enterpriseGuid = '763a3ac3-a3c7-4379-9735-2a4a96e87e5d'

This took about 0.01 second, although the two queries basically produce the same result! I looked for the official MySQL documentation and I found an interesting remark here:

Indices lose their speed advantage when using them in OR-situations (4.1.10):

SELECT * FROM a WHERE index1 = 'foo' UNION SELECT * FROM a WHERE index2 = 'baar';

is much faster than

SELECT * FROM a WHERE index1 = 'foo' OR index2 = 'bar';

So, my question has 3 parts:

  • Is it really bad to use OR clauses in the select queries at production system (that is, where dataset is significantly large)?
  • Can this OR query somehow be tuned up by indexes? Right now both columns in the query that I use for filtering are in fact indexed. Can I create some tricky composite index in order to make OR work as fast as UNION ALL?
  • Is this a vendor-specific issue or will I have the same trouble with Oracle or Postgresql?
SPIRiT_1984
  • 2,717
  • 3
  • 29
  • 46
  • Well for your first example, I would join on enterpriseGuid and have a single condition in the where clause `WHERE a.enterpriseGuid = '763a3ac3-a3c7-4379-9735-2a4a96e87e5d'` – ZeroBased_IX Oct 02 '15 at 11:20
  • I don't see how that would help me. Because I need not just documents, where document himself references to the enterprise, but also documents containing waybills, referring to the same enterprise, and for that second case the document itself may refer to another enterprise, and your query would not include that in list. – SPIRiT_1984 Oct 02 '15 at 11:24

2 Answers2

2

OR is not per se bad. As with almost any other construct in SQL, it might or might not be a good idea.

You have found a problem with the optimizer . . . and one that is common to many databases. When your OR conditions are from different tables, it is very difficult for the optimizer to take advantage of indexes.

Your improved solution works because each subquery can take advantage of indexes.

You might find that the following works better than the first version but worse than the second:

SELECT d.*
FROM Document d
WHERE d.enterpriseGuid = '763a3ac3-a3c7-4379-9735-2a4a96e87e5d' OR
      (EXISTS (SELECT 1
               FROM Waybill b
               WHERE d.waybill = b.id AND
                     b.enterpriseGuid = '763a3ac3-a3c7-4379-9735-2a4a96e87e5d'
              )
      );
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • So, does or work fine for the case, then conditions are on the same table? Because the sample I quoted from the official mysql site uses or for a single table, and still claims it to work slow. – SPIRiT_1984 Oct 02 '15 at 11:26
  • @SPIRiT_1984 . . . I don't understand your comment. – Gordon Linoff Oct 03 '15 at 02:41
0

This is an optimizer-related issue, so it may differ between Engine/Version/Table stats etc.

Actually you can't say that full table scan is always worse than two index scans followed by sorting out the results (i.e. union operator). That depends on index selectivity. Still you have to be very careful with OR, that's true.

Matt
  • 13,674
  • 1
  • 18
  • 27