0

I have a situation when I need to show the balance for each user, in relation to other users.

Table structure & dummy data script:

CREATE TABLE transactions (
    id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    user1 INT NOT NULL,
    user2 INT NOT NULL,
    amount INT NOT NULL
);
INSERT INTO transactions VALUES(1, 1, 2, 10);
INSERT INTO transactions VALUES(2, 1, 3, 15);
INSERT INTO transactions VALUES(3, 4, 1, 25);
INSERT INTO transactions VALUES(4, 1, 5, 20);
INSERT INTO transactions VALUES(5, 5, 1, 18);
INSERT INTO transactions VALUES(6, 5, 1, 2);

Result:

enter image description here

Now I want to sum-up information (balances) for user = 1. The result that I want to see is this:

user    balance
2   10
3   15
4   -25
5   0

Now, I am using the latest stable MySQL version 5.7.17-0ubuntu0.16.04.1. And I have 2 problems:

  • MySQL does not support FULL OUTER JOIN clause
  • MySQL does not support WITH clause

My hands are tied at this point. I want to write a fast and efficient query for above situation. Here are my two attempts (none is working):

This one is not working because I can not use FULL OUTER JOIN clause

SELECT IFNULL(t3.user, t4.user), IFNULL(t3.amount, 0) - IFNULL(t4.amount, 0)
FROM (
    select t1.user2 user, sum(t1.amount) amount
    from transactions t1
    where 1=1
        and t1.user1 = 1
    group by t1.user2
) t3
FULL OUTER JOIN (
    select t2.user1 user, sum(t2.amount) amount
    from transactions t2
    where 1=1
        and t2.user2 = 1
    group by t2.user1
) t4 ON t3.user = t4.user

This one is not working because I can not use WITH clause

WITH t3 AS
 (
    select t1.user2 user, sum(t1.amount) amount
    from transactions t1
    where 1=1
        and t1.user1 = 1
    group by t1.user2
),
t4 AS
(
    select t2.user1 user, sum(t2.amount) amount
    from transactions t2
    where 1=1
        and t2.user2 = 1
    group by t2.user1
)
SELECT
    t1.user,
    IFNULL(t3.amount, 0) - IFNULL(t4.amount, 0) balance
FROM t1
LEFT JOIN t3 ON t1.user = t2.user
UNION
SELECT t2.user FROM t1
RIGHT JOIN t3 ON t1.user = t2.user

Update

Using the solution provided by Gurwinder Singh I was able to test the performance for both queries on around 5 millions of rows of test data (although number of data where either user1 = 1 or user2 = 1 - is far less than that).

enter image description here

and (with union)

enter image description here

accordingly. Query 1 is 34% faster ((3.4-2.24)/3.4*100 = 34).

Note that there are no indexes on this table. I will later try to do the same kind of testing using MariaDB and compare the results.

Update 2

After indexing columns: user1, user2, amount the situation has changed.

Query 1 run time:

Showing rows 0 - 2 (3 total, Query took 1.9857 seconds.)

Query 2 run time:

Showing rows 0 - 2 (3 total, Query took 1.5641 seconds.)

But I still think that this is quite bad result. Maybe I will put some triggers to update the balance into a dedicated table. But at this point the answer is answered.

halfer
  • 19,824
  • 17
  • 99
  • 186
Alex
  • 4,607
  • 9
  • 61
  • 99

1 Answers1

1

You can use CASE based conditional aggregation:

Try this:

select case 
        when user1 = 1
            then user2
        else user1
        end as user,
    sum(case 
            when user1 = 1
                then amount
            else - amount
            end) as amount
from transactions
where 1 in (user1, user2)
group by case 
        when user1 = 1
            then user2
        else user1
        end;

Demo

Or a two step aggregation:

select user, sum(amount) as amount
from (
    select user2 as user, sum(amount) as amount
    from transactions
    where user1 = 1
    group by user2

    union all

    select user1 as user, -sum(amount) as amount
    from transactions
    where user2 = 1
    group by user1
) t
group by user;

Demo

Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
  • this solution indeed works.. I am wondering if it is going to be faster or not if I just switched to `MariaDB` instead and used `full outer join` capabilities. Seems like I need a bit of performance testing. Nevertheless your answer is exactly what I asked for. Thanks! – Alex May 07 '17 at 18:24
  • You'll still need to do aggregation. I wonder if two step aggregation will be faster. I'll add that to the answer. You can test both to see if one is better than the other. – Gurwinder Singh May 07 '17 at 18:27
  • 1
    I have done some testing on your solution. Query which uses `cases` is a bit faster. See update. – Alex May 07 '17 at 19:51
  • @Alex - Hmm. I think all you need now are appropriate indexes to speed up the things if needed. – Gurwinder Singh May 07 '17 at 19:59
  • This is planned! I will do the comparison when there are indexes on these 2 columns. Probably on Tuesday. I will also update the `results` section with that :) Stay tuned – Alex May 07 '17 at 20:06
  • I seems to be doing something incorrectly. After creating indexes - the second query hands forever, with 100% CPU usage. I was able to finish running it successfully a few times, the result was 100+ seconds!! i.e. decrease of performance x30. Tried running `optimization` and `defragmentation` commands in PhpMyAdmin and also restarting `MySQL` service. No success. First query executes successfully. Second - hangs forever. At first I have created 2 indexes on `user1` and `user2` (separate). Afterwards I ended up creating one more index for both columns (together). Any advices? – Alex May 07 '17 at 21:00
  • Alright, after trial and error - I seems to have found out that `SUM` function is killing the query. Example: `select user2, sum(amount) from transactions where user1 = 1 group by user2`. Without the SUM it finished in a split-second. Otherwise hangs forever. Is it a good practice to index the `amount` column? – Alex May 07 '17 at 21:17
  • there was an increase in performance, but not as good as I expected. See update – Alex May 09 '17 at 09:12