I'm trying to formulate a query that will find all users who have not had a transaction for a specified period of time. My problem is that my query is acting as if it gets caught in a nested loop. I'm trying to figure out where my logic is flawed.
I can't give the actual query because it's for work, but here is a sample using similar structure. (Yes, the Balance/Transaction data is spread over two pair of tables... it's what I have to work with)
Given the schema:
Users Balances_A Transactions_A
user_id account_id <-\ transaction_id
ssn <------+-- ssn \-- account_id
occupation | balance amount
name | type trdate
address | department
|
|
| Balances_B Transactions_B
| account_id <-\ transaction_id
+-- ssn \-- account_id
balance amount
code (*) trdate
department
* same as type, just different field name.
Note: each "<---" indicates a 1 to many relationship
Task: find all users who have had an account of type='A', department='1' whose current balance is 0.00 and have had a type 'A' transaction within the last year. Also need to know what their balance is for all other types of transactions, but excluding for types 'X' and 'Y'.
Parameters: department='1', type='A', transaction date < one year ago, balance=0.00
Here's what I tried:
SELECT
u.user_id, u.name, u.address, u.ssn,
account_balances_a.other_balance,
account_balances_b.other_balance,
last_transaction_a.last_transaction_date,
last_transaction_b.last_transaction_date
FROM users AS u
-- attach other balance total from A
LEFT JOIN ( SELECT SUM(balance) as other_balance
FROM balances_a as bal_a
WHERE bal_a.type NOT IN ('A','X','Y') AND bal_a.department='1'
GROUP BY bal_a.ssn
) AS account_balances_a
ON u.ssn = account_balances_a.ssn
-- attach other balance total from B
LEFT JOIN ( SELECT SUM(balance) as other_balance
FROM balances_b as bal_b
WHERE bal_b.code NOT IN ('A','X','Y') AND bal_b.department='1'
GROUP BY bal_b.ssn
) AS account_balances_b
ON u.ssn = account_balances_b.ssn
-- regular join balance A table
, balances_a AS ba
-- attach last transaction date ( transactions A )
LEFT JOIN ( SELECT MAX(temp1.trdate) as last_transaction_date
FROM transactions_a as temp1
GROUP BY temp1.account_id
) AS last_transaction_a
ON temp1.account_id = ba.account_id
-- regular join balance B table
, balances_b AS bb
-- attach last transaction date ( transactions B )
LEFT JOIN ( SELECT MAX(temp2.trdate) as last_transaction_date
FROM transactions_b as temp2
GROUP BY temp2.account_id
) AS last_transaction_b
ON temp2.account_id = bb.account_id
WHERE
u.occupation='ditch digger'
-- user has an account type 'A' with department '1' in the specified time frame:
AND (
-- either in Balance A table,
( u.ssn=ba.ssn AND ba.balance=0.00 AND ba.type='A' AND ba.department='1' and last_transaction_a.last_transaction_date>'$one_year-ago' )
OR
-- or in Balance B table
( u.ssn=bb.ssn AND bb.balance=0.00 AND bb.code='A' AND bb.department='1' and last_transaction_b.last_transaction_date>'$one_year-ago' )
)
ORDER BY last_transaction_a.last_transaction_date
The problem appears to be in the WHERE clause; if I comment out the "...in Balance A table" or the "...in Balance B table", the query works. But with both, it's trying to order millions of records.
After putting it out here, I think I see why it fails; but if you take the time to think through this with me and can explain reasonably well why it fails, I would be grateful.