SQL query mixing aggregated results and single values

Question

I have a table with transactions. Each transaction has a transaction ID, and accounting period (AP), and a posting value (PV), as well as other fields. Some of the IDs are duplicated, usually because the transaction was done in error. To give an example, part of the table might look like:

ID    PV    AP  
123   100   2  
123   -100  5

In this case the transaction was added in AP2 then removed in AP5.

Another example would be:

ID    PV    AP  
456   100   2  
456   -100  5  
456   100   8

In the first example, the problem is that if I am analyzing what was spent in AP2, there is a transaction in there which actually shouldn't be taken into account because it was taken out again in AP5. In the second example, the second two transactions shouldn't be taken into account because they cancel each other out.

I want to label as many transactions as possible which shouldn't be taken into account as erroneous. To identify these transactions, I want to find the ones with duplicate IDs whose PVs sum to zero (like ID 123 above) or transactions where the PV of the earliest one is equal to sum(PV), as in the second example. This second condition is what is causing me grief.

So far I have

SELECT *
FROM table
WHERE table.ID IN (SELECT table.ID
                    FROM table
                    GROUP BY table.ID
                    HAVING COUNT(*) > 1
                    AND (SUM(table.PV) = 0
                    OR SUM(table.PV) = <PV of first transaction in each group>))
ORDER BY table.ID;

The bit in chevrons is what I'm trying to do and I'm stuck. Can I do it like this or is there some other method I can use in SQL to do this?

Edit 1: Btw I forgot to say that I'm using SQL Compact 3.5, in case it matters.

Edit 2: I think the code snippet above is a bit misleading. I still want to mark out transactions with duplicate IDs where sum(PV) = 0, as in the first example. But where the PV of the earliest transaction = sum(PV), as in the second example, what I actually want is to keep the earliest transaction and mark out all the others with the same ID. Sorry if that caused confusion.

Edit 3: I've been playing with Clodoaldo's solution and have made some progress, but still can't get quite what I want. I'm trying to get the transactions I know for certain to be erroneous. Suppose the following transactions are also in the table:

ID     PV    AP  
789    100   2  
789    200   5  
789   -100   8

In this example sum(PV) <> 0 and the earliest PV <> sum(PV) so I don't want to mark any of these out.

If I modify Clodoaldo's query as follows:

    select t.*
    from 
    t
    left join (
        select id, min(ap) as ap, sum(pv) as sum_pv
        from t
        group by id
        having sum(pv) <> 0
    ) s on t.id = s.id and t.ap = s.ap and t.pv = s.sum_pv
     where s.id is null

This gives the result

 ID      PV     AP
123      100    2
123     -100    5
456     -100    5
456      100    8
789      100    3
789      200    5
789     -100    8

Whilst the first 4 transactions are ok (they would be marked out), the 789 transactions are also there, and I don't want them. But I can't figure out how to modify the query so that they're not included. Any ideas?

What about +6, +4, -10, +3? Should the first three be marked out? Or should transactions be marked out only if they have an exact opposite? And what about +10, +20, -10? Do the +/- 10 get marked out? Or do they have to be sequential? — MatBailie, Oct 09 '12 at 15:38
What field identifies "earliest" do you have a datetime field or a sequential number that I'm not seeing? — RThomas, Oct 09 '12 at 15:39
How are you defining 'group'? Do we have to worry about looping years? We're assuming that 'earliest' means the (lowest) AP value. Do we only have to worry about single records for comparisons, or could it be that the 'sum' you mention may be made up of any number of 'transactions'? If so, how are we supposed to tell where the breaks are (are transactions only supposed to be recorded once)? — Clockwork-Muse, Oct 09 '12 at 15:41
@Dems: I know it's more complicated than the two examples I've given but as I said, I want to mark out as many transactions as possible and then maybe leave the user to identify the rest. — Paul Flowerdew, Oct 09 '12 at 15:54
@ RThomas: By earliest I meant based on the AP, but now that you've mentioned it I hadn't taken into account the real possibility of two transactions in one AP. However I think the first transaction is usually alone, then corrections are added in later APs — Paul Flowerdew, Oct 09 '12 at 15:58
@PaulFlowerdew I edited my answer to include a version that shows the invalid transactions in instead of the valid ones. — Clodoaldo Neto, Oct 09 '12 at 16:00
@ Clockwork-Mus: Good points. By group I meant the group of transactions with the same ID. I did mean lowest AP as earliest. The sum is the sum of the posting values of transactions with the same ID. The IDs are not unique because corrections are made, like the first example. I don't think I have to worry about years rolling over because all the transactions should be corrected by FY-end. — Paul Flowerdew, Oct 09 '12 at 16:08

Clodoaldo Neto · Answer 1 · 2012-10-09T22:17:57.013

2

SQL Fiddle

select t.* 
from 
    t
    inner join (
        select id, min(ap) as ap
        from t
        group by id
        having sum(pv) <> 0
    ) s on t.id = s.id and t.ap = s.ap

The above gets the valid transactions. If you want the invalid ones use this:

select t.*
from 
    t
    left join (
        select id, min(ap) as ap
        from t
        group by id
        having sum(pv) <> 0
    ) s on t.id = s.id and t.ap = s.ap
where s.id is null

SQL Fiddle

edited Oct 09 '12 at 22:17

answered Oct 09 '12 at 15:49

Clodoaldo Neto

118,695
26
233
260

Thanks for this. I tried the second one but got a message saying "Column names must be specified for constants, expressions or aggregate functions when they occur in a FROM subquery." Not sure whether this is some SQL Compact Edition quirk or what. – Paul Flowerdew Oct 09 '12 at 21:46
@PaulFlowerdew Changed the syntax slightly adding the `as` to set the alias for the `min(ap)` column. – Clodoaldo Neto Oct 09 '12 at 22:00
@PaulFlowerdew According to the [manual](http://msdn.microsoft.com/en-us/library/ms173451%28v=sql.105%29.aspx) the `as` is optional. Did you forget to set the alias when translating to your schema? – Clodoaldo Neto Oct 09 '12 at 22:09
Thanks Clodoaldo, it was the alias that did it. However, I don't think this is doing quite what I wanted. I've added in a "AND COUNT(*) > 0" after "HAVING sum(pv) <> 0" because I'm not interested in IDs that occur only once, but it still doesn't do what I want. Have a look at my edit 2 above; I've hopefully made what I want to do a bit clearer. – Paul Flowerdew Oct 10 '12 at 08:07

score 0 · Answer 2 · answered Oct 09 '12 at 15:43

0

Try something like this:

UPDATE
    Transactions
SET
    IsError = true
WHERE
    EXISTS
    (SELECT
        NULL
    FROM 
        Transactions SubsequentTransactions
    WHERE
        Transactions.ID = SubsequentTransactions.ID
    AND Transactions.AP < SubsequentTransactions.AP
    AND Transactions.PV = -1 * SubsequentTransactions.PV)

I think that will work. I haven't tested it at all so I'd suggest that you use the WHERE clause in a select statement first to ensure it will only affect the rows you want.

This won't flag negative transactions as errors (you may or may not need to), except for in your second example. In your second example there is a third record which cancels the second one if they are taken in isolation. You may find you need to expand the logic to fully get what you need but it should get you started.

answered Oct 09 '12 at 15:43

Tobsey

3,390
14
24

Thanks, I'll have a look at this tomorrow. – Paul Flowerdew Oct 09 '12 at 21:47
Tried the select statement on its own but it's coming up with an error on the SubsequentTransactions.ID. Not sure exactly what it's supposed to do - sorry I'm quite new to SQL. – Paul Flowerdew Oct 10 '12 at 08:57
See here: http://www.sqlfiddle.com/#!6/3c283/2 it should work. What RDBMS are you using? – Tobsey Oct 10 '12 at 10:10
See my edit 3 above; I'm sorry I didn't make it clearer what I was trying to do originally. – Paul Flowerdew Oct 10 '12 at 15:38

SQL query mixing aggregated results and single values

2 Answers2