6

Below query returns the initial result fast and then becomes extremely slow.

SELECT A.Id
, B.Date1
FROM A
LEFT OUTER JOIN B
ON A.Id = B.Id AND A.Flag = 'Y'
AND (B.Date1 IS NOT NULL AND A.Date >= B.Date2 AND A.Date < B.Date1)

Table A has 24 million records and Table B has 500 thousand records.

Index for Table A is on columns: Id and Date

Index for Table B is on columns: Id, Date2, Date1 - Date1 is nullable - index is unique

Frist 11m records are returned quite fast and it then suddenly becomes extremely slow. Execution Plan shows the indexes are used.

However, when I remove condition A.Date < B.Date1, query becomes fast again.

Do you know what should be done to improve the performance? Thanks

UPDATE: I updated the query to show that I need fields of Table B in the result. You might think why I used left join when I have condition "B.Date1 is not null". That's because I have posted the simplified query. My performance issue is even with this simplified version.

Bob
  • 227
  • 1
  • 10

2 Answers2

6

You can maybe try using EXISTS. It should be faster as it stops looking for further rows once a match is found unlike JOIN where all the rows will have to be fetched and joined.

select id
from a
where flag = 'Y'
    and exists (
        select 1
        from b
        where a.id = b.id
            and a.date >= b.date2
            and a.date < b.date1
            and date1 is not null
        );
Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
  • for simplicity i eliminated most of my query. based on your approach how can I use fields of table b in main select. – Bob Feb 24 '17 at 04:53
  • @Bob - If that's the case, your own query will do. Regarding performance comparison of with and without filter `A.Date < B.Date1`, you can't really compare the two as they are not equivalent. – Gurwinder Singh Feb 24 '17 at 05:00
0

Generally what I've noticed with queries, and SQL performance is the DATA you are joining, for instance ONE to ONE relationships are much faster than ONE to MANY relationships.

I've noticed ONE to MANY relationship on table 3000 items, joining to a table with 30,000 items can easily take up to 11-15 seconds, with LIMIT. But that same query, redesigned with all ONE TO ONE relationships would take less than 1 second.

So my suggestion to speed up your query. According to Left Outer Join (desc) "LEFT JOIN and LEFT OUTER JOIN are the same" so it doesn't matter which one, you use.

But ideally, should use INNER because in your question you stated B.Date1 IS NOT NULL

Based on this parent columns in join selection (desc), you can use parent column in SELECT within JOIN.

SELECT a.Id FROM A a 
INNER JOIN (SELECT b.Id AS 'Id', COUNT(1) as `TotalLinks` FROM B b WHERE ((b.Date1 IS NOT NULL) AND ((a.Date >= b.Date2) AND (a.Date < b.Date1)) GROUP BY b.Id) AS `ab` ON (a.Id = ab.Id) AND (a.Flag = 'Y')
WHERE a.Flag = 'Y' AND b.totalLinks > 0
LIMIT 0, 500

Try and also, LIMIT the DATA you want; this will reduce the filtering necessary by SQL.

Community
  • 1
  • 1
classicjonesynz
  • 4,012
  • 5
  • 38
  • 78