4

Consider I have three tables (bet,win,cancel) that will contain the data for rounds of bets. Each round can have 1 or more bets and 1 or more wins. A round may be cancelled before any wins are recorded.

I would like to aggregate the data so that each round is displayed as one row, however my queries are very slow after testing with millions of dummy records each day.

Below are example data

Bet

bookmaker_id    provider_id round_id    transaction_id  game_id user_id bet_amount  balance bet_timestamp
1               1           Round1      bet_tx1         game1   123     10.00       90.00   2022-03-17 01:40:57.400
1               1           Round2      bet_tx2         game1   123     10.00       70.00   2022-03-17 02:40:57.400
1               1           Round2      bet_tx3         game1   123     10.00       80.00   2022-03-17 03:40:57.400
1               1           Round3      bet_tx4         game1   123     10.00       70.00   2022-03-17 04:40:57.400

Win

bookmaker_id    provider_id round_id    transaction_id  user_id win_amount  balance win_timestamp
1               1           Round1      win_tx1         123     0.00        80.00   2022-03-17 01:40:57.400
1               1           Round2      win_tx2         123     10.00       80.00   2022-03-17 02:40:57.400

Cancel

bookmaker_id    provider_id round_id    transaction_id  user_id cancel_timestamp
1               1           Round3      can_tx1         123     2022-03-17 01:40:57.400

Each table has two indexes:
Primary Key: (bookmaker_id , provider_id , transaction_id)
Index: (bookmaker_id, provider_id, round_id)

The following query attempts to aggregate the round information for the first bet transaction. It totals the amount bet, totals the amount won, links the cancellation record if it exists and appends the transaction_id's together if there are multiple. It also returns the opening balance of the first bet and the closing balance of the last win.

Attempt at query

SELECT 
  bet.*, 
  win.*, 
  cancel.transaction_id as cancel_transaction_id, 
  cancel.cancel_timestamp 
FROM 
  (
    SELECT 
      round_id, 
      game_id, 
      provider_id, 
      bookmaker_id, 
      user_id, 
      STRING_AGG(bet.transaction_id, ', ') as bet_transaction_ids, 
      SUM(bet_amount) as total_bet, 
      MIN(bet_timestamp) as bet_timestamp, 
      opening_balance 
    FROM 
      (
        SELECT 
          FIRST_VALUE(balance) OVER (
            PARTITION BY round_id 
            order by 
              bet_timestamp asc
          ) as opening_balance, 
          * 
        FROM 
          bet 
        WHERE 
          bookmaker_id = 1 
          AND provider_id = 1 
          AND transaction_id = 'bet_tx1'
      ) AS bet 
    GROUP BY 
      round_id, 
      game_id, 
      provider_id, 
      bookmaker_id, 
      user_id, 
      opening_balance
  ) AS bet 
  LEFT JOIN (
    SELECT 
      round_id, 
      STRING_AGG(win.transaction_id, ', ') as win_transaction_ids, 
      SUM(win_amount) as total_won, 
      MAX(win_timestamp) as win_timestamp, 
      closing_balance 
    FROM 
      (
        SELECT 
          FIRST_VALUE(balance) OVER (
            PARTITION BY round_id 
            ORDER BY 
              win_timestamp desc
          ) AS closing_balance, 
          * 
        FROM 
          win
      ) AS win 
    GROUP BY 
      win.round_id, 
      closing_balance
  ) AS win ON bet.round_id = win.round_id 
  LEFT JOIN (
    SELECT 
      [round_id], 
      [transaction_id], 
      [cancel_timestamp] 
    FROM 
      cancel
  ) AS cancel ON bet.round_id = cancel.round_id 
ORDER BY 
  total_won;

Result

round_id    game_id provider_id bookmaker_id    user_id bet_transaction_ids total_bet   bet_timestamp             opening_balance   round_id    win_transaction_ids total_won   win_timestamp             closing_balance   cancel_transaction_id   cancel_timestamp
Round1      game1   1           1               123     bet_tx1             10.00       2022-03-17 01:40:57.400   90.00             Round1      win_tx1             0.00        2022-03-17 01:40:57.400   80.00             null                    null

I assume this is not the most efficient way to link these tables if all I have is the bet transaction id considering the indexes and millions of potential records. I would also like to select multiple rounds filtered by bet_timestamp but I hope I can apply anything learned here to that query too.

How should I query these three tables more efficiently?

Any assistance would be most appreciated.

DB Fiddle Here

Dale K
  • 25,246
  • 15
  • 42
  • 71
Fuzz
  • 906
  • 1
  • 12
  • 24

4 Answers4

1

I do not know if it works, but could you try this one:

First add an index to your bet table like this(keep the current one too):

Index: (bet_timestamp, round_id, bookmaker_id, provider_id)

Also you have primary key like below:

Primary Key: (bookmaker_id , provider_id , transaction_id)

I do not know histogram of your data in columns "bookmaker_id , provider_id" and in this example you are using "transaction_id" as one of key column of the primary key, but it does not provide any relationship between other tables. After checking historgram of the data refreshing architecture could be good.

Please try this(before executing please create index):

WITH openingtab
     AS (SELECT round_id,
                game_id,
                provider_id,
                bookmaker_id,
                user_id,
                String_agg(bet.transaction_id, ', ') AS bet_transaction_ids,
                Sum(bet_amount)                      AS total_bet,
                Min(bet_timestamp)                   openingbet_time
         FROM   bet
         WHERE  bookmaker_id = 1
                AND provider_id = 1
         GROUP  BY round_id,
                   game_id,
                   provider_id,
                   bookmaker_id,
                   user_id),
     bettab
     AS (SELECT b.round_id,
                b.user_id,
                b.game_id,
                b.balance opening_bet,
                b.provider_id,
                b. bookmaker_id,
                o.bet_transaction_ids,
                o.total_bet,
                o.openingbet_time
         FROM   openingtab o
                JOIN bet b
                  ON o.openingbet_time = b.bet_timestamp
                     AND o.round_id = b.round_id
                     AND o.provider_id = b.provider_id
                     AND o.bookmaker_id = b.bookmaker_id),
     betwin
     AS (SELECT b.*,
                w.transaction_id wtid,
                w.win_amount,
                w.balance,
                w.win_timestamp,
                c.transaction_id ctid,
                c.cancel_timestamp
         FROM   bettab b
                LEFT OUTER JOIN win w
                             ON b.round_id = w.round_id
                                AND b.provider_id = w.provider_id
                                AND b.bookmaker_id = w.bookmaker_id
                LEFT OUTER JOIN cancel c
                             ON b.round_id = c.round_id
                                AND b.provider_id = c.provider_id
                                AND b.bookmaker_id = c.bookmaker_id)
SELECT *
FROM   betwin; 
Umut TEKİN
  • 856
  • 1
  • 9
  • 19
1

First, the subquery of bet might rewrite as below query, because there is a Primary Key: (bookmaker_id , provider_id , transaction_id) which all hit your condition, so that might be no sense to use aggregate & window function because Primary Key will not be duplicated.

SELECT 
  round_id, 
  game_id, 
  provider_id, 
  bookmaker_id, 
  user_id, 
  transaction_id as bet_transaction_ids, 
  bet_amount as total_bet, 
  bet_timestamp as bet_timestamp, 
  balance as opening_balance 
FROM bet
WHERE 
    bookmaker_id = 1 
    AND provider_id = 1 
    AND transaction_id = 'bet_tx1'

The second thing, try to use indexes & filter your result set as possible.

In the below query, I had added conditions bookmaker_id & provider_id on the cancel table when doing OUTER JOIN which might hit your indexes as possible.

because the first column of index is very important that will let QO determine whether use indexes, so try to use first column as well

SELECT 
  bet.*, 
  win.*, 
  cancel.transaction_id as cancel_transaction_id, 
  cancel.cancel_timestamp 
FROM 
  (
    SELECT 
      round_id, 
      game_id, 
      provider_id, 
      bookmaker_id, 
      user_id, 
      transaction_id as bet_transaction_ids, 
      bet_amount as total_bet, 
      bet_timestamp as bet_timestamp, 
      balance as opening_balance 
    FROM bet
    WHERE 
        bookmaker_id = 1 
        AND provider_id = 1 
        AND transaction_id = 'bet_tx1'
  ) AS bet 
  LEFT JOIN (
    SELECT 
      round_id, 
      STRING_AGG(win.transaction_id, ', ') as win_transaction_ids, 
      SUM(win_amount) as total_won, 
      MAX(win_timestamp) as win_timestamp, 
      closing_balance
    FROM 
      (
        SELECT 
          FIRST_VALUE(balance) OVER (
            PARTITION BY round_id 
            ORDER BY 
              win_timestamp desc
          ) AS closing_balance, 
          * 
        FROM 
          win
      ) AS win  
    GROUP BY 
      win.round_id, 
      closing_balance
  ) AS win ON bet.round_id = win.round_id 
  LEFT JOIN cancel ON bet.round_id = cancel.round_id 
  -- add filter condition
  AND bet.bookmaker_id = cancel.bookmaker_id
  AND bet.provider_id = cancel.provider_id
ORDER BY 
  total_won;

Final need to review all of the columns whether you indeed need to try to avoid select * (unless you are using cluster index and really need all of them)

The rewritten query of the execution plan might do cluster seek on bet & cancel table and reduce more cost the original one.

sqlfiddle

D-Shih
  • 44,943
  • 6
  • 31
  • 51
1

Try with the following index :

CREATE INDEX X001 ON win (balance, win_timestamp, round_id) INCLUDE (transaction_id, win_amount)

I have rewrite your query with CTE :

WITH
WIN0 AS
(
select FIRST_VALUE(balance) OVER (PARTITION BY round_id order by win_timestamp desc) as closing_balance, 
       round_id, transaction_id, win_amount, win_timestamp
from   win
),
WIN1 AS
(
SELECT round_id, 
       STRING_AGG(transaction_id, ', ') as win_transaction_ids, 
       SUM(win_amount) as total_won, 
       MAX(win_timestamp) as win_timestamp, 
       closing_balance 
FROM   WIN0
group  by round_id, 
       closing_balance
),
BET0 AS
(
select FIRST_VALUE(balance) OVER (PARTITION BY round_id order by bet_timestamp asc) as opening_balance, 
       * 
from   bet 
where  bookmaker_id = 1 
       and provider_id = 1 
       and transaction_id = 'bet_tx1'
),
BET1 AS
(
SELECT round_id, 
       game_id, 
       provider_id, 
       bookmaker_id, 
       user_id, 
       STRING_AGG(bet.transaction_id, ', ') as bet_transaction_ids, 
       SUM(bet_amount) as total_bet, 
       MIN(bet_timestamp) as bet_timestamp, 
       opening_balance 
FROM   BET0 as bet 
group  by round_id, 
       game_id, 
       provider_id, 
       bookmaker_id, 
       user_id, 
       opening_balance
),
CCL0 AS
(
SELECT round_id, 
       transaction_id, 
       cancel_timestamp 
FROM   cancel
)
select betx.*, 
       winx.*, 
       cancelx.transaction_id as cancel_transaction_id, 
       cancelx.cancel_timestamp 
from   BET1 as betx 
       LEFT OUTER JOIN WIN1 as winx 
          ON betx.round_id = winx.round_id 
       LEFT OUTER JOIN CCL0 as cancelx 
          ON betx.round_id = cancelx.round_id 
order  by total_won;
SQLpro
  • 3,994
  • 1
  • 6
  • 14
1

IMO, when you are asking about performance of query then following info is missing

i) Explain table structure.Like how transaction_id,round_id,bookmaker_id,provider_id are form and combination of what make it unique in Table.

ii)DataType :round_id,transaction_id seem to be very important column and they cannot be nvarchar or varchar

iii) What are your existing index on table ?I cannot suggest index now because table structure is not clear to me.

There is two very visible fault in your query

i) Using subquery and nested subquery.

ii) In inner "Win query" there is no filter,it will select all unecessary data

iii) Partition function can "Under perform" specially if it select million of record.I have no idea how many record it will select and requirement is not 100% clear,so I have not corrected it.I may use Group By over Partition function for this even if i hv to use multiple Select.

iv) if bookmaker_id, provider_id,transaction_id select one record then why aggregate it or put Partition function.

Try my sample query

declare  @Bet table(bookmaker_id int,  provider_id int, round_id varchar(50), transaction_id varchar(50),  game_id varchar(50), user_id int, bet_amount decimal(5,2),  balance decimal(5,2), bet_timestamp datetime)

insert into @Bet (bookmaker_id, provider_id, round_id,transaction_id,game_id, user_id ,bet_amount,  balance, bet_timestamp)
values
 (1 ,1,'Round1','bet_tx1','game1',123, 10.00,90.00,'2022-03-17 01:40:57.400')
,(1 ,1,'Round2','bet_tx2','game1',123, 10.00,70.00,'2022-03-17 02:40:57.400')
,(1 ,1,'Round2','bet_tx3','game1',123, 10.00,80.00,'2022-03-17 03:40:57.400')
,(1 ,1,'Round3','bet_tx4','game1',123, 10.00,70.00,'2022-03-17 04:40:57.400')

declare  @Win table(bookmaker_id int,  provider_id int, round_id varchar(50), transaction_id varchar(50), user_id int, win_amount decimal(5,2),  balance decimal(5,2), win_timestamp datetime)
insert into @Win (bookmaker_id, provider_id, round_id,transaction_id, user_id ,win_amount,  balance, win_timestamp)
values
(1,1, 'Round1','win_tx1',123, 0.00   ,80.00,'2022-03-17 01:40:57.400')
,(1,1, 'Round2','win_tx2',123, 10.00  ,80.00,'2022-03-17 02:40:57.400')

declare  @Cancel table(bookmaker_id int,  provider_id int, round_id varchar(50), transaction_id varchar(50), user_id int,    cancel_timestamp datetime)
insert into @Cancel (bookmaker_id,provider_id, round_id,    transaction_id,  user_id, cancel_timestamp)
values

(1 ,1,'Round3','can_tx1',123,'2022-03-17 01:40:57.400')
--select * from @Cancel
create table  #Bet (bookmaker_id int,provider_id int, round_id varchar(50), bet_transaction_ids varchar(50),  game_id varchar(50), user_id int, total_bet decimal(5,2),  opening_balance decimal(5,2), bet_timestamp datetime)
-- you can bet the dynamic filter here
insert into #Bet (round_id,game_id,provider_id,bookmaker_id,user_id,bet_transaction_ids,total_bet,bet_timestamp,opening_balance)
 SELECT 
      round_id, 
      game_id, 
      provider_id, 
      bookmaker_id, 
      user_id, 
      STRING_AGG(bet.transaction_id, ', ') as bet_transaction_ids, 
      SUM(bet_amount) as total_bet, 
      MIN(bet_timestamp) as bet_timestamp, 
      opening_balance 
    FROM 
      (
        SELECT 
          FIRST_VALUE(balance) OVER (
            PARTITION BY round_id 
            order by 
              bet_timestamp asc
          ) as opening_balance, 
          * 
        FROM 
          @Bet 
        WHERE 
          bookmaker_id = 1 
          AND provider_id = 1 
          AND transaction_id = 'bet_tx1'
      ) AS bet 
    GROUP BY 
      round_id, 
      game_id, 
      provider_id, 
      bookmaker_id, 
      user_id, 
      opening_balance

      create table  #Win (bookmaker_id int,  provider_id int, round_id varchar(50), win_transaction_ids varchar(50), user_id int, total_won decimal(5,2),  closing_balance decimal(5,2), win_timestamp datetime)
      insert into #Win (round_id,win_transaction_ids,total_won,win_timestamp,closing_balance)
      SELECT 
      round_id, 
      STRING_AGG(transaction_id, ', ') as win_transaction_ids, 
      SUM(win_amount) as total_won, 
      MAX(win_timestamp) as win_timestamp, 
      closing_balance 
    FROM 
    (
      SELECT 
          FIRST_VALUE(balance) OVER (
            PARTITION BY round_id 
            ORDER BY 
              win_timestamp desc
          ) AS closing_balance, 
          * 
        FROM  @Win win
        where exists( select * from #Bet bet where bet.round_id = win.round_id )
        )t

      select bet.round_id, 
      bet.game_id, 
      bet.provider_id, 
      bet.bookmaker_id, 
      bet.user_id, 
       bet_transaction_ids, 
       total_bet, 
       bet_timestamp, 
      opening_balance ,
      win_transaction_ids, 
       total_won, 
       win_timestamp, 
      closing_balance ,
       cancel.[transaction_id] as cancel_transaction_id, 
      cancel.[cancel_timestamp] 
      from #Bet bet
      left join #Win win on bet.round_id=win.round_id
      left join @Cancel cancel on bet.round_id = cancel.round_id

          drop table #Bet,#Win

If you provide more detail then I can further optimize my query

KumarHarsh
  • 5,046
  • 1
  • 18
  • 22