2

Goal:

I want to speed up a sql query of about a million rows of transaction data (order data). I've been able to reduce the time from 50 minutes (using temp tables) to 9 minutes using CROSS APPLY() (see query below). Is there a way I can eliminate using ROW_NUMBER() to find the highest dollar amount spend by a customer / year (group by customer, year)? ROW_NUMBER() can be computationally expensive. Additionally there are no indexes on these tables.

Code:

select z.string_customer_name, z.string_customer_region, z.string_industry_group, 
  z.string_city, z.string_state, z.string_country, z.string_booking_type, 
  z.string_sales_branch, z.string_sales_region, z.string_sales_area,
  z.int_booking_year, z.float_sum_total, z.string_tpis_concat, z.string_groupby
from (
    select #temp_00.*, ca_01.float_sum_total, ca_00.string_tpis_concat, 
      ROW_NUMBER() over (partition by #temp_00.string_groupby order by #temp_00.string_groupby, 
        ca_01.float_sum_total) as row_num
    from #temp_00
    cross apply(
        select string_groupby, int_booking_year, sum(float_total) as float_sum_total
        from #temp_00
       group by string_groupby, int_booking_year
    ) as ca_01
    cross apply(
        select string_groupby, STRING_AGG(cast(string_customer_tpi 
          as varchar(max)), '|') as string_tpis_concat
        from #temp_00
        group by string_groupby
    ) as ca_00
    where ca_00.string_groupby = #temp_00.string_groupby and 
      ca_01.string_groupby = #temp_00.string_groupby and 
      ca_01.int_booking_year = #temp_00.int_booking_year
) as z
where z.row_num = 1

Temp table columns:

string_customer_name -> 'customer name'
string_customer_tpi -> 'customer id'
string_customer_region -> 'customer region'
string_industry_group -> 'customer industry group'
string_city -> 'customer city'
string_state -> 'customer state'
string_country -> 'customer country'
string_booking_type -> 'order type'
string_sales_branch -> 'sales branch'
string_sales_region -> 'sales region'
string_sales_area -> 'sales area of the world'
int_booking_year -> 'order year'
float_total -> 'order total in dollars'
string_groupby -> 'concatenation of customer name, customer city, customer state, 
  customer country, customer industry group'  

Execution Plan for posted query

The XML for the query is too large to post. Although the picture of the execution plan is small I the second post is where I think most of the time is at the Sort(). 60% (posted query is 79% cost while the data pull is 21%) of both the initial data pull and the posted query is in the Sort().

Query Execution Plan Link

enter image description here enter image description here

Brent
  • 196
  • 7
  • 1
    you should post the XML for the actual execution plan – Martin Smith Jan 15 '19 at 18:21
  • @MartinSmith I'm will get it posted shortly. – Brent Jan 15 '19 at 18:27
  • 1
    I have a hard time believing the the temp table approach is takes 5 times longer than the Cross Apply. I've witnessed tremendous gains by re-tooling some queries to consume temp tables. – John Cappelletti Jan 15 '19 at 18:30
  • @JohnCappelletti I did it multiple times. It could the be architecture of my company or a server resource issue. I promise it's the truth. – Brent Jan 15 '19 at 18:34
  • @Brent Fair enough... I promise I wan't calling you a liar. :) – John Cappelletti Jan 15 '19 at 18:38
  • You can post the plan to https://www.brentozar.com/pastetheplan/ . The actual plan XML contains a lot of useful info missing from your question – Martin Smith Jan 15 '19 at 18:59
  • @MartinSmith posted; see the link in the execution plan section – Brent Jan 15 '19 at 19:05
  • The semantics of the query seem strange. It gets reduced down to 48,881 rows from 1,000,000 rows by just selecting any arbitrary row belonging to the year with the *lowest* `float_sum_total` for that `string_groupby` - are the values in the other selected columns guaranteed to be the same for all rows with the same `string_groupby, int_booking_year`? – Martin Smith Jan 15 '19 at 19:57
  • @MartinSmith ```string_groupby``` is acting as key for summarization; for this application yes that is correct; i have to test this and potentially determine a better key; I'm actually trying to find the *highest* ```float_sum_total``` – Brent Jan 15 '19 at 20:32

1 Answers1

0

I can't be sure but the cross apply may be avoided if I understood what you were doing. and that will help with the performance but as I dont have access to the data you must test it and see.

so I will start after you put the data in the temp table. try the below:-

 ;with TempWithSum as (
 --get the sum partition by string_groupby, int_booking_year
  select *,sum(float_total) over(partition by string_groupby, int_booking_year) as float_sum_total
  from @temp_00
 ),NamesCat as(
  --get all customer names grouped by string_groupby
  select string_groupby, STRING_AGG(cast(string_customer_tpi as varchar(max)), '|') as string_tpis_concat
  from @temp_00
  group by string_groupby
 ),AllData as(
 --get the row number partition string_groupby and ordered by string_groupby, float_sum_total
  select string_customer_name, string_customer_region, string_industry_group, string_city, z.string_state,
         string_country, string_booking_type, string_sales_branch, string_sales_region, string_sales_area,
         int_booking_year, float_sum_total, string_tpis_concat, string_groupby
        ,ROW_NUMBER() over (partition by string_groupby order by string_groupby, float_sum_total) as row_num
    from TempWithSum z
    inner join NamesCat on NamesCat.string_groupby=TempWithSum.string_groupby
  )
  select * from AllData where row_num=1

Hope it works and provide the result you need in the time frame you expect.

Note: I know you wanted to eliminate the row number, and im suggesting eliminating the cross apply, but your goal is the performance.

Ali Al-Mosawi
  • 783
  • 6
  • 12