2

I am dealing with a SQL query that requires many self joining tables to create aggregate functions such as the following.

SELECT  
    tab1.foo, 
    tab1.bar, 
    tab1.baz, 
    tab1.bam/tab2.bam as bam_ratio, 
    RANK() OVER (PARTITION BY tab1.bar, tab1.baz ORDER BY tab1.foo DESC) RANK
FROM 
    (SELECT 
         foo, 
         bar, 
         baz, 
         bam
     FROM 
         OPENDATASOURCE(server).dbo.table1
     WHERE 
         qux = 1 AND quux = 'A' AND corge = 2) tab1
JOIN
    (SELECT 
         foo, 
         bar, 
         baz, 
         bam
     FROM 
         OPENDATASOURCE(server).dbo.table1
     WHERE 
         qux = 1 AND quux = 'B' AND corge = 2) tab2 ON tab1.bar = tab2.bar
                                                    AND tab1.baz = tab2.baz
                                                    AND tab1.foo = tab2.foo

The issue being each of these tables have few columns but tens of millions of records. This has created a unique case where indexing does not adequately improve performance (The query above uses every row in table1).

What I have noticed is these table calls tend to only change in one or two minor respects such as quux in the example and they do not have primary keys requiring more rows to be used than is necessary which doesn't help the indexes and seems redundant.

Are there ways to improve the performance further than simple indexing? I am willing to alter the underlying tables and create orphan tables.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Alex
  • 140
  • 6
  • 19
  • 1
    An easy first step would be to drop the records from the OPENDATASOURCE queries into temp tables and then join on the TEMP. This would allow you to apply indexes on the temp tables as well (after the insert) but would probably still be faster than your current derrived table even without adding indexes. – S3S Nov 28 '18 at 15:45
  • Can you drop the execution plan here? Also the main gain you can have will be from executing it on the source box and just reading it on the box you are executing your query on. – Bartosz X Nov 28 '18 at 15:49
  • You state these tables have no primary keys. Not a great idea but do they have a clustered index? If not you are dealing with heaps and performance is going to be horrible no matter what you do. – Sean Lange Nov 28 '18 at 15:50
  • @scsimon That's interesting, I didn't realize you could index a temp table. Are you suggesting to remove the OPENDATASOURCE outside of putting the table variations in temp tables? – Alex Nov 28 '18 at 15:51
  • @SeanLange That is what I expected. I do not have primary keys and no, there is no clustered index. I will add one. – Alex Nov 28 '18 at 15:52
  • Well you need it to pull in your data, or created a linked server, or use OPENQUERY or what ever method you want... but I'm suggesting you take the `SELECT FROM OPENDATSOURCE` and put it at the top of the query in `SELECT INTO #TEMP FROM OPENDATASOURCE` and then in your final select, join to the `#TEMP` tables. – S3S Nov 28 '18 at 15:53
  • 1
    @scsimon Yes that is what I figured, I will try this and see how it affects the results. – Alex Nov 28 '18 at 16:09
  • 1
    @scsimon I didn't add indexes to the temp tables but edited the current indexes and made temp tables that would cut down on the OPENDATASOURCE queries. This improved performance over 99%. If you put that into an answer I will mark it as the solution – Alex Nov 30 '18 at 19:30
  • Sure thing @Alex – S3S Nov 30 '18 at 19:33

1 Answers1

1

One performance issue you could improve is joining across servers. This happens with OPENROWSET, linked servers, etc.

The easiest way to mitigate it is to pull your data into temp tables or staging tables from your remote data sources. This will store the data locally in tempdb. Then, join to the temp tables. SQL Server will automatically create statistics for these tables which will greatly improve your performance, plus the data is local, but you can also add indexes to speed up your joins if you’d like.

S3S
  • 24,809
  • 5
  • 26
  • 45
  • 1
    Doing reducing the OPENROWSET improved performance of the query 99.96%. Very good catch! – Alex Nov 30 '18 at 19:40