Consider I have three tables (bet,win,cancel) that will contain the data for rounds of bets. Each round can have 1 or more bets and 1 or more wins. A round may be cancelled before any wins are recorded.
I would like to aggregate the data so that each round is displayed as one row, however my queries are very slow after testing with millions of dummy records each day.
Below are example data
Bet
bookmaker_id provider_id round_id transaction_id game_id user_id bet_amount balance bet_timestamp
1 1 Round1 bet_tx1 game1 123 10.00 90.00 2022-03-17 01:40:57.400
1 1 Round2 bet_tx2 game1 123 10.00 70.00 2022-03-17 02:40:57.400
1 1 Round2 bet_tx3 game1 123 10.00 80.00 2022-03-17 03:40:57.400
1 1 Round3 bet_tx4 game1 123 10.00 70.00 2022-03-17 04:40:57.400
Win
bookmaker_id provider_id round_id transaction_id user_id win_amount balance win_timestamp
1 1 Round1 win_tx1 123 0.00 80.00 2022-03-17 01:40:57.400
1 1 Round2 win_tx2 123 10.00 80.00 2022-03-17 02:40:57.400
Cancel
bookmaker_id provider_id round_id transaction_id user_id cancel_timestamp
1 1 Round3 can_tx1 123 2022-03-17 01:40:57.400
Each table has two indexes:
Primary Key: (bookmaker_id , provider_id , transaction_id)
Index: (bookmaker_id, provider_id, round_id)
The following query attempts to aggregate the round information for the first bet transaction. It totals the amount bet, totals the amount won, links the cancellation record if it exists and appends the transaction_id's together if there are multiple. It also returns the opening balance of the first bet and the closing balance of the last win.
Attempt at query
SELECT
bet.*,
win.*,
cancel.transaction_id as cancel_transaction_id,
cancel.cancel_timestamp
FROM
(
SELECT
round_id,
game_id,
provider_id,
bookmaker_id,
user_id,
STRING_AGG(bet.transaction_id, ', ') as bet_transaction_ids,
SUM(bet_amount) as total_bet,
MIN(bet_timestamp) as bet_timestamp,
opening_balance
FROM
(
SELECT
FIRST_VALUE(balance) OVER (
PARTITION BY round_id
order by
bet_timestamp asc
) as opening_balance,
*
FROM
bet
WHERE
bookmaker_id = 1
AND provider_id = 1
AND transaction_id = 'bet_tx1'
) AS bet
GROUP BY
round_id,
game_id,
provider_id,
bookmaker_id,
user_id,
opening_balance
) AS bet
LEFT JOIN (
SELECT
round_id,
STRING_AGG(win.transaction_id, ', ') as win_transaction_ids,
SUM(win_amount) as total_won,
MAX(win_timestamp) as win_timestamp,
closing_balance
FROM
(
SELECT
FIRST_VALUE(balance) OVER (
PARTITION BY round_id
ORDER BY
win_timestamp desc
) AS closing_balance,
*
FROM
win
) AS win
GROUP BY
win.round_id,
closing_balance
) AS win ON bet.round_id = win.round_id
LEFT JOIN (
SELECT
[round_id],
[transaction_id],
[cancel_timestamp]
FROM
cancel
) AS cancel ON bet.round_id = cancel.round_id
ORDER BY
total_won;
Result
round_id game_id provider_id bookmaker_id user_id bet_transaction_ids total_bet bet_timestamp opening_balance round_id win_transaction_ids total_won win_timestamp closing_balance cancel_transaction_id cancel_timestamp
Round1 game1 1 1 123 bet_tx1 10.00 2022-03-17 01:40:57.400 90.00 Round1 win_tx1 0.00 2022-03-17 01:40:57.400 80.00 null null
I assume this is not the most efficient way to link these tables if all I have is the bet transaction id considering the indexes and millions of potential records. I would also like to select multiple rounds filtered by bet_timestamp but I hope I can apply anything learned here to that query too.
How should I query these three tables more efficiently?
Any assistance would be most appreciated.