0

The application I work on generates an SQL query somewhat like this:

Select 
    VISIT_VIEW.VISIT_ID, VISIT_VIEW.PATIENT_ID, VISIT_VIEW.MRN_ID, 
    VISIT_VIEW.BILL_NO, INSURANCE.INS_PAYOR'
FROM 
    'VISIT_VIEW 
LEFT JOIN 
    INSURANCE ON VISIT_VIEW.visit_id = INSURANCE._fk_visit '
WHERE 
    'VISIT_VIEW.VISIT_ID IN (1002, 1003, 1005, 1006, 1007, 1008, 1010, 1011, <...>, 1193, 1194, 1195, 1196, 1197, 1198, 1199)'

The <...> represents a long list of ids. The size of the list depends on the results of a previous query, and in turn on the parameters selected to generate that query.

The list of IDs can be anywhere from 100 items long to above 2000.

The INSURANCE table is large, over 9 million rows. The visit table is also large, but not quite as large.

As the number of IDs goes up there is a fairly sharp increase from a duration of less than a second to over 15 minutes. The increase starts somewhere around 175 ids.

If the parameters used to generate the query are changed so that the INS_PAYOR column is not selected, and thus there is no left join, the query runs in less than a second, even with over 2000 items in the list of IDs.

The execution plan shows that 97% of the query time is devoted to a clustered seek on the INSURANCE table.

How can I rework this query to get the same results with a less horrific delay?

Do remember that the SQL is being generated by code, not by hand. It is generated from a list of fields (with knowledge of which field belongs to which table) and a list of IDs in the primary table to check. I do have access to the code that does the query generation, and can change it provided that the ultimate results of the query are exactly the same.

Thank you

David Siegel
  • 221
  • 2
  • 19
  • 1
    It's much more likely to be that `IN()` condition that's slow rather than joining the two tables. – Joel Coehoorn Aug 18 '17 at 23:35
  • That may be, but when it is tried with the IN condition but without the join, the result time goes down from over 15 minutes to under 1 second. This is by actual measurement on site. I can't readily try it with the join but without the IN, so i suspect that it is both together that are slowing this down – David Siegel Aug 18 '17 at 23:48

2 Answers2

0

The <...> represents a long list of ids. The size of the list depends on the results of a previous query

Don't do that.

Do this:

SELECT <...>
FROM VISIT_VIEW 
INNER JOIN (
    <previous query goes here>
) t on VISIT_VIEW.VISIT_ID = t.<ID?>
LEFT JOIN INSURANCE ON VISIT_VIEW.visit_id=INSURANCE._fk_visit
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • That will be a problem. I won't say that it is impossible, but the previous query result currently gets processed in several ways (mostly to exclude items) before it is used to generate the final query. There is also user confirmation between the two. But thanks for the suggestion. – David Siegel Aug 18 '17 at 23:47
0

See if you see any improvements using the following...

IF OBJECT_ID('tempdb..#VisitList', 'U') IS NOT NULL 
DROP TABLE #VisitList;

CREATE TABLE #VisitList (
    VISIT_ID INT NOT NULL PRIMARY KEY
    );

INSERT #VisitList (VISIT_ID) VALUES (1002),(1003),(1005),(1006),(1007),(1008),(1010),(1011),(<...>),(1193),(1194),(1195),(1196),(1197),(1198),(1199);

SELECT 
    vv.VISIT_ID, 
    vv.PATIENT_ID, 
    vv.MRN_ID, 
    vv.BILL_NO, 
    ix.INS_PAYOR
FROM 
    VISIT_VIEW vv
    JOIN #VisitList vl
        ON vv.VISIT_ID = vl.VISIT_ID
    CROSS APPLY (
                SELECT TOP 1
                     i.INS_PAYOR
                FROM 
                    INSURANCE i
                WHERE 
                    vv.visit_id=i._fk_visit
                ) ix;
Jason A. Long
  • 4,382
  • 1
  • 12
  • 17