I have been tasked with taking a group of customers and splitting them into two equal groups for each store location. The result set requested would have the two groups for each store location within 1% of each other on customer count, within 1% of each other on order count, and within 1% of each other on amount ordered.
Below is the code I came up with and it works fairly well and most of the time it gets the desired result but sometimes(I think due to an outlier in the group) the % will be further off than 1%.
If OBJECT_ID('tempdb.dbo.#Orders') IS NOT NULL DROP TABLE #Orders
Select
StoreID
,CustomerID
,Sum(OrderID) as Orders
,Sum(OrderAmount) as AmountSold
Into #Orders
From CustomerOrders
Group by StoreID,CustomerID
IF OBJECT_ID('tempdb.dbo.#OrderRanking') IS NOT NULL DROP TABLE #OrderRanking
Select
O.*
,ROW_NUMBER() Over(Partition by StoreID Order by AmountSold, Orders) as Ranking
Into #OrderRanking
From #Orders as O
Select
R.StoreID
,Count(CustomerID) as CustomerCount
,Sum(R.Orders) as Orders
,Sum(R.AmountSold) as Amountsold
,Case When Ranking%2 = 0 Then 'A' Else 'B' End as 'Grouping'
From #OrderRanking as R
Group by
R.StoreID
,Case When Ranking%2 = 0 Then 'A' Else 'B' End
Is there a better way to split the groups to ensure the 1% variance? Or maybe a way to loop through several different splits until it finds a 1%? If looping would need a fail safe to prevent infinite loop in case of impossible split something like after x loops just take closest split.
I am using SQL Server 2012 and SSMS 2016. Thanks for any help you can provide.
Edit: I had tried to convert the code into something not company specific I messed up the code. I realized that and adjusted the code to show what is truly being sought after.
Edit2: I made some progress on my own and wanted to update the question.
So I was working on this some more and I was able to get it to sort on a random order each time you run the code and have it display the Variance for each of the groups. Now all I want to add is a way to loop through X number times and keep the one that has lowest overall variance. This weekend I might try a few more things. But for now below is the new code I spoke of.
If OBJECT_ID('tempdb.dbo.#Orders') IS NOT NULL DROP TABLE #Orders
Select
StoreID
,CustomerID
,Sum(OrderID) as Orders
,Sum(OrderAmount) as AmountSold
,Rand() as Random
Into #Orders
From CustomerOrders
Group by StoreID,CustomerID
IF OBJECT_ID('tempdb.dbo.#OrderRanking') IS NOT NULL DROP TABLE #OrderRanking
Select
O.*
,ROW_NUMBER() Over(Partition by StoreID Order by Random) as Ranking
Into #OrderRanking
From #Orders as O
If OBJECT_ID('tempdb.dbo.#Split') IS NOT NULL DROP TABLE #Split
Select
R.StoreID
,Count(CustomerID) as CustomerCount
,Sum(R.Orders) as Orders
,Sum(R.AmountSold) as Amountsold
,Case When Ranking%2 = 0 Then 'A' Else 'B' End as 'Grouping'
Into #Split
From #OrderRanking as R
Group by
R.StoreID
,Case When Ranking%2 = 0 Then 'A' Else 'B' End
Select
S.StoreID
,((Cast(Max(Case When S.[Grouping] = 'A' Then S.CustomerCount Else 0 End) as decimal(18,2))-Cast(Max(Case When S.[Grouping] = 'B' Then S.CustomerCount Else 0 End) as decimal(18,2)))
/ Cast(Max(Case When S.[Grouping] = 'B' Then S.CustomerCount Else 0 End) as decimal(18,2)))*100 as CustomerCountVar
,((Cast(Max(Case When S.[Grouping] = 'A' Then S.Orders Else 0 End) as decimal(18,2))-Cast(Max(Case When S.[Grouping] = 'B' Then S.Orders Else 0 End) as decimal(18,2)))
/ Cast(Max(Case When S.[Grouping] = 'B' Then S.Orders Else 0 End) as decimal(18,2)))*100 as OrderVar
,((Cast(Max(Case When S.[Grouping] = 'A' Then S.Amountsold Else 0 End) as decimal(18,2))-Cast(Max(Case When S.[Grouping] = 'B' Then S.Amountsold Else 0 End) as decimal(18,2)))
/ Cast(Max(Case When S.[Grouping] = 'B' Then S.Amountsold Else 0 End) as decimal(18,2)))*100 as AmountsoldVar
From #Split as S
Group by S.StoreID