1

a is a table of 5,000,000 records and a score. b is a table of 100,000 records and a (different) score. I want to output every record from table a as long as the same record is not in table b with a score equal to 1.

The following query I wrote has taken 15 hours to execute and is still running. Is it possible that it could take this long to do such a query or is there perhaps something wrong with my code?

SELECT a.c1, a.c2, a.score
INTO OUTFILE 'file.csv'
FROM a
LEFT OUTER JOIN b
ON a.c1=b.c1
AND a.c2=b.c2
WHERE b.score <> 1
Mitchell
  • 169
  • 1
  • 14

2 Answers2

1

Your query retrieves all records from table a that are also in table b with a score not equal to 1 (and it retrieves those records as many times as they appear in table b with a score other than 1). Which is different from "every record from table a as long as the same record is not in table b with a score equal to 1".

SELECT a.c1, a.c2, a.score
INTO OUTFILE 'file.csv'
FROM a
WHERE NOT EXISTS (
    SELECT 1 FROM b
    WHERE a.c1 = b.c1
    AND a.c2 = b.c2
    AND b.score = 1
)

The query above can take advantage of the following index

create index my_idx on b(c1,c2,score)
FuzzyTree
  • 32,014
  • 3
  • 54
  • 85
1

You can achieve his in different ways.

SELECT a.c1, a.c2, a.score
INTO OUTFILE 'file.csv'
FROM a WHERE NOT EXISTS
(SELECT * FROM b WHERE  a.c1=b.c1
AND a.c2=b.c2 and b.score = 1)

OR

  SELECT a.c1, a.c2, a.score
    INTO OUTFILE 'file.csv'
    FROM a WHERE NOT EXISTS
    (SELECT * FROM a  INNER JOIN b 
ON a.c1=b.c1
AND a.c2=b.c2 and b.score =1 )

ANOTHER WAY

;with t as 
(
SELECT a.c1, a.c2, a.score from a 
),
u as 
(
select * FROM b where score =1 and b.c2=t.c2 and b.c1 = t.c1
)
SELECT t.c1,t.c2,t.score FROm t INTO OUTPUT 'FILe.csv WHERE NOT EXISTS (SELECT * FROM  u)

You can use In as well, but this may have a performance issue. http://www.w3resource.com/sql/special-operators/sql_exists.php

Nuru Salihu
  • 4,756
  • 17
  • 65
  • 116