Goal:
I want to speed up a sql query of about a million rows of transaction data (order data). I've been able to reduce the time from 50 minutes (using temp tables) to 9 minutes using CROSS APPLY() (see query below). Is there a way I can eliminate using ROW_NUMBER() to find the highest dollar amount spend by a customer / year (group by customer, year)? ROW_NUMBER() can be computationally expensive. Additionally there are no indexes on these tables.
Code:
select z.string_customer_name, z.string_customer_region, z.string_industry_group,
z.string_city, z.string_state, z.string_country, z.string_booking_type,
z.string_sales_branch, z.string_sales_region, z.string_sales_area,
z.int_booking_year, z.float_sum_total, z.string_tpis_concat, z.string_groupby
from (
select #temp_00.*, ca_01.float_sum_total, ca_00.string_tpis_concat,
ROW_NUMBER() over (partition by #temp_00.string_groupby order by #temp_00.string_groupby,
ca_01.float_sum_total) as row_num
from #temp_00
cross apply(
select string_groupby, int_booking_year, sum(float_total) as float_sum_total
from #temp_00
group by string_groupby, int_booking_year
) as ca_01
cross apply(
select string_groupby, STRING_AGG(cast(string_customer_tpi
as varchar(max)), '|') as string_tpis_concat
from #temp_00
group by string_groupby
) as ca_00
where ca_00.string_groupby = #temp_00.string_groupby and
ca_01.string_groupby = #temp_00.string_groupby and
ca_01.int_booking_year = #temp_00.int_booking_year
) as z
where z.row_num = 1
Temp table columns:
string_customer_name -> 'customer name'
string_customer_tpi -> 'customer id'
string_customer_region -> 'customer region'
string_industry_group -> 'customer industry group'
string_city -> 'customer city'
string_state -> 'customer state'
string_country -> 'customer country'
string_booking_type -> 'order type'
string_sales_branch -> 'sales branch'
string_sales_region -> 'sales region'
string_sales_area -> 'sales area of the world'
int_booking_year -> 'order year'
float_total -> 'order total in dollars'
string_groupby -> 'concatenation of customer name, customer city, customer state,
customer country, customer industry group'
Execution Plan for posted query
The XML for the query is too large to post. Although the picture of the execution plan is small I the second post is where I think most of the time is at the Sort(). 60% (posted query is 79% cost while the data pull is 21%) of both the initial data pull and the posted query is in the Sort().