0

I have a list of hospitals (FacilityID_UUID) that have to submit data every day (date = create_date).

On a given day, I'd like to establish which did NOT report on the previous day.

Started with this:

SELECT `FacilityID_UUID`
FROM `TEST`
where `FacilityID_UUID` not in (
  SELECT distinct `FacilityID_UUID`
  FROM `TEST`
  where `Create_Date` = '2014-09-16'
)

(will replace dates with sysdate, or interval 1 day)

However, I have two problems with this:

1- it is soooo slow! Been running like 30 mins now and I'm not expecting more than 20 results). How can I make it faster?

2- it only provides me with the names of hospitals who submitted at least once in a the past, but not yesterday. If more hospitals begin submission on Wednesday, I'd like Thursday's query to pick up that. Basically, I'd like to say "give me hospitals on list X which did not submit yesterday" and write up the list X myself.

Any ideas

Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
Farah
  • 75
  • 1
  • 2
  • 6
  • something like a left outer join is probably faster than an in. Then just find all results where some field on the left table is null. Also starting with your smaller table is probably a good way to boost speed, – Marshall Tigerus Sep 16 '14 at 19:56
  • 1
    The "TEST" the *only* relevant table? (It seems odd if it is and likely need to be corrected.) What is *full* schema definition and the multiplicity of FacilityID_UUID? How does a "Create Date" related to a "Submit Date"? – user2864740 Sep 16 '14 at 20:03

2 Answers2

0

You can do a LEFT JOIN to get each test that has been reported or not to this date, and then keep only null values on the second table to see only tests that were not reported at the specified date.

SELECT FacilityID_UUID
FROM TEST
LEFT JOIN TEST AS REPORT_DATE_TEST ON TEST.FacilityID_UUID = REPORT_DATE_TEST.FacilityID_UUID
AND REPORT_DATE_TEST.Create_Date = '2014-09-16
WHERE REPORT_DATE_TEST.FacilityID_UUID IS NULL;

The previous query will give the same result as your query, but faster.

But I suspect you might want to have only one row per UUID, in that case it would be

SELECT DISTINCT FacilityID_UUID
FROM TEST
LEFT JOIN TEST AS REPORT_DATE_TEST ON TEST.FacilityID_UUID = REPORT_DATE_TEST.FacilityID_UUID
AND REPORT_DATE_TEST.Create_Date = '2014-09-16
WHERE REPORT_DATE_TEST.FacilityID_UUID IS NULL;
user327961
  • 2,440
  • 3
  • 22
  • 20
0

Query speed is dependent on a lot of factors, and you will be able to tune this better than anyone on here. Let's look at your data and your query.

SELECT FacilityID_UUID FROM TEST where FacilityID_UUID not in 
         (SELECT distinct FacilityID_UUID FROM TEST where Create_Date = '2014-09-16')

This query is checking each line of TEST and returning those values that are not in the list of facilities that have a Create_Date of . This is definitely going to be slow. For N items you are comparing to some subset of N that is likely fairly large. If TEST is a big table (which if it has a new entry every day for each facility, it will be) then you are going to run into speed issues.

A better query is:

Select FacilityID_UUID, max(Create_Date) from TEST where Create_Date < CURDATE() - 1
                 group by FacilityID_UUID order by FacilityID_UUID;

That should be close to correct for your data (might have to tweek the where clause, I don't do a lot of work with dates in MySQL).

Marshall Tigerus
  • 3,675
  • 10
  • 37
  • 67