0

I'm trying to formulate a query that will find all users who have not had a transaction for a specified period of time. My problem is that my query is acting as if it gets caught in a nested loop. I'm trying to figure out where my logic is flawed.

I can't give the actual query because it's for work, but here is a sample using similar structure. (Yes, the Balance/Transaction data is spread over two pair of tables... it's what I have to work with)

Given the schema:

Users          Balances_A       Transactions_A
user_id        account_id <-\    transaction_id
ssn <------+-- ssn           \-- account_id
occupation |   balance           amount
name       |   type              trdate
address    |   department                  
           |                     
           |
           |   Balances_B       Transactions_B
           |   account_id <-\    transaction_id
           +-- ssn           \-- account_id
               balance           amount
               code (*)          trdate
               department        


* same as type, just different field name.
Note: each "<---" indicates a 1 to many relationship

Task: find all users who have had an account of type='A', department='1' whose current balance is 0.00 and have had a type 'A' transaction within the last year. Also need to know what their balance is for all other types of transactions, but excluding for types 'X' and 'Y'.

Parameters: department='1', type='A', transaction date < one year ago, balance=0.00

Here's what I tried:

SELECT 
    u.user_id, u.name, u.address, u.ssn, 
    account_balances_a.other_balance, 
    account_balances_b.other_balance, 
    last_transaction_a.last_transaction_date,
    last_transaction_b.last_transaction_date

FROM users AS u

-- attach other balance total from A
LEFT JOIN ( SELECT SUM(balance) as other_balance
            FROM balances_a as bal_a
            WHERE bal_a.type NOT IN ('A','X','Y') AND bal_a.department='1'
            GROUP BY bal_a.ssn
          ) AS account_balances_a
            ON u.ssn = account_balances_a.ssn

-- attach other balance total from B
LEFT JOIN ( SELECT SUM(balance) as other_balance
            FROM balances_b as bal_b
            WHERE bal_b.code NOT IN ('A','X','Y') AND bal_b.department='1'
            GROUP BY bal_b.ssn
          ) AS account_balances_b
            ON u.ssn = account_balances_b.ssn


-- regular join balance A table

, balances_a AS ba

-- attach last transaction date ( transactions A )
LEFT JOIN ( SELECT MAX(temp1.trdate) as last_transaction_date
            FROM transactions_a as temp1
            GROUP BY temp1.account_id
          ) AS last_transaction_a
            ON temp1.account_id = ba.account_id

-- regular join balance B table

, balances_b AS bb

-- attach last transaction date ( transactions B )
LEFT JOIN ( SELECT MAX(temp2.trdate) as last_transaction_date
            FROM transactions_b as temp2
            GROUP BY temp2.account_id
          ) AS last_transaction_b
            ON temp2.account_id = bb.account_id

WHERE

    u.occupation='ditch digger'

    -- user has an account type 'A' with department '1' in the specified time frame:
    AND (
            -- either in Balance A table, 
            ( u.ssn=ba.ssn AND ba.balance=0.00 AND ba.type='A' AND ba.department='1' and last_transaction_a.last_transaction_date>'$one_year-ago' )
            OR
            -- or in Balance B table
            ( u.ssn=bb.ssn AND bb.balance=0.00 AND bb.code='A' AND bb.department='1' and last_transaction_b.last_transaction_date>'$one_year-ago' )
        )

ORDER BY last_transaction_a.last_transaction_date

The problem appears to be in the WHERE clause; if I comment out the "...in Balance A table" or the "...in Balance B table", the query works. But with both, it's trying to order millions of records.

After putting it out here, I think I see why it fails; but if you take the time to think through this with me and can explain reasonably well why it fails, I would be grateful.

Tim Morton
  • 2,614
  • 1
  • 15
  • 23
  • Add the indexes available and the explain plan for this query? – dilsingi Jul 30 '17 at 06:52
  • That's a huge part of the problem I'm having... If I were on mysql that's the first place I would look. But with this database at work, apparently `explain` doesn't work. I'll check it again, in case I'm operating on an incorrect notion. The other issue is indexes. My understanding is that this database will analyse the queries sent to it and create indexes as needed. i had never heard of such a thing, but that's what my manager tells me. – Tim Morton Jul 30 '17 at 19:49
  • Tried to check explain and got this error: `[EXPLAIN - 0 rows, 0.033 secs] [Code: -104, SQL State: 42601] [SQL0104] Token EXPLAIN was not valid. Valid tokens: ( CL END GET SET CALL DROP FREE HOLD LOCK OPEN WITH ALTER. – Tim Morton Jul 30 '17 at 20:03
  • AHA. I'm being sabotaged by the tools I've been given. DB Visualizer Free does not support explain. And apparently the command is a little different from mysql, too. No wonder it didn't work. I'll do a work-around tomorrow at work. – Tim Morton Jul 30 '17 at 20:20
  • Post also the DB2 version+fixpack and platform (Z/OS, OS/400, AIX/Linux/Windows etc). I dont know what you mean "..will create indexes as needed". – mao Jul 31 '17 at 08:22

2 Answers2

1

Because you must join balancea and transactiona first before joint to user. Otherwise you do 2 cross join (really low performance because scan all table a multiply by all line => you use an OR into your where clause)

try to modify your query like this

SELECT 
    u.user_id, u.name, u.address, u.ssn, 
    account_balances_a.other_balance, 
    account_balances_b.other_balance, 
    last_transaction_a.last_transaction_date,
    last_transaction_b.last_transaction_date

FROM users AS u

LEFT OUTER JOIN LATERAL
 ( 
  SELECT SUM(bal_a.balance) as other_balance FROM balances_a as bal_a
  WHERE bal_a.department='1' and u.ssn = bal_a_a.ssn and bal_a.type NOT IN ('A','X','Y')
 ) account_balances_a on 1=1

LEFT OUTER JOIN LATERAL
 (
  SELECT SUM(bal_b.balance) as other_balance FROM balances_b as bal_b
  WHERE bal_b.department='1' and u.ssn = bal_b.ssn and bal_b.type NOT IN ('A','X','Y')
 ) account_balances_b on 1=1

LEFT OUTER JOIN LATERAL
 (
  SELECT MAX(temp1.trdate) as last_transaction_date
  FROM transactions_a as temp1 inner join balances_a ba on temp1.account_id = ba.account_id
  WHERE u.ssn = ba.ssn and ba.type='A' and ba.balance=0.00 and ba.department='1'  
 ) last_transaction_a on last_transaction_date>current date - 1 year

LEFT OUTER JOIN LATERAL
 ( 
  SELECT MAX(temp2.trdate) as last_transaction_date
  FROM transactions_b as temp2 inner join balances_b bb on temp2.account_id = bb.account_id
  where u.ssn=bb.ssn AND bb.code='A' AND bb.balance=0.00 AND bb.department='1' 
 ) last_transaction_b on last_transaction_date>current date - 1 year

WHERE u.occupation='ditch digger' 
AND (last_transaction_a.last_transaction_date is not null or last_transaction_b.last_transaction_date is not null)

ORDER BY last_transaction_a.last_transaction_date
Esperento57
  • 16,521
  • 3
  • 39
  • 45
  • I think that's where my intuition was headed. It looks like I'm making some incorrect assumptions about how tables are joined. I'll give this a try and give feedback a bit later. Thanks. – Tim Morton Jul 31 '17 at 13:31
  • What I really needed was the term "cross-join". I knew what was happening but I didn't know what to call it. I found a different culprit (balances/users vs balances/transactions) to blame for it, but I think I discovered a way to spot unintentional cross joins. If you're inclined to look over my answer, I'd welcome any comments. Thanks for the nudge in the right direction, and I learned a new trick from your code! – Tim Morton Aug 01 '17 at 04:02
0

Thanks to the nudge in Esperento57's answer, I re-researched "cross joins". I had forgotten that queries with multiple (comma separated) tables start out cross-joined; I essentially have 3 tables cross-joined in the from clause. (at least, that was my intention.) So it's up to the where clause to join them properly.

...which it was painfully obvious that it did not do.

In my mind, all the tables were tied together with users.ssn. So it iterates through users (with the various filters tied to it in balances a & b) and everything should work fine.

[eureka moment]

...and then it iterates through balances_a, and everything goes horribly awry. The where clause does not even come close to joining the tables the way I thought they were. The OR causes a cross join between balances_a and users.

And if that weren't bad enough, then it starts the whole thing over again with balances_b.

Which leads me to the troubleshooting concept I was looking for. Whether this is how the database actually works or not, it seems that you can think of each comma-separated table as iterating through all of its rows. (ie, a cross join) The where clause must work for iterations of EACH comma-separated table.

Since this query was an abysmal failure, I started over and discovered it was far better to do a union on the (filtered) balances and then left join users and the sum(balances) and max(dates).

Tim Morton
  • 2,614
  • 1
  • 15
  • 23