Differentiate transactions by customer, given sales table (customer_id, article_id)

Question

This is hard to explain but I'll try my best.

There is a sales table, that has a row for every article that a customer buy (simplified). so, for example, if a customer go to the store and buy 3 things, and then another customer buy 2 things, the sales table would looks like this:

ID  |  CUSTOMER_ID | ARTICLE_ID 
1          21            42
2          21            32
3          21            34
4          22            42
5          22            33

The problem is that I need to add two new columns.

TRX_ID (transaction_id): it's when a customer visit the store, buy articles and leave.
TRX_COR (transaction_correlative): it's the position of the article by a transaction.

The result should show this:

ID  |  CUSTOMER_ID | ARTICLE_ID | TRX_ID | TRX_COR
1          21            42         1        1
2          21            32         1        2
3          21            34         1        3
4          22            42         2        1
5          22            33         2        2

With this information I might know how many transaction really are(2 in this case), and which is the first item to be sold, or passed through the paydesk (42 in this case), among other things.

Considerations:

I also stored the day (but not the exact time of the transaction)
There are a lot of customer by day
One customer can return during the day and that is a totally new transaction.
There is no way that two customers to buy at the same time.

How do you want the corner case of a customer making a transaction then immediately making another transaction handled? If, in your above example, the customer ID was the same for all 5 rows, is it okay if all of them were marked with `TRX_ID 1`? — Ben Gribaudo, May 19 '15 at 16:03
Which SQL DBMS are you using? Microsoft SQL Server 2008/2008 R2/2012/2014? MySQL? — Ben Gribaudo, May 19 '15 at 16:04
@BenGribaudo it's perfectly ok if all of them were marked with TRX_ID 1 — Marcelo, May 19 '15 at 17:28

Ben Gribaudo · Accepted Answer · 2015-05-20T19:02:38.983

1

Something like the below, which uses window aggregates, should (I think) calculate the IDs the way you requested.

SELECT ID, Customer_ID, Article_ID,
    DENSE_RANK() OVER (ORDER BY Customer_ID, Gr) AS TRX_ID,
    ROW_NUMBER() OVER (PARTITION BY Customer_ID, Gr ORDER BY ID) AS TRX_COR 
FROM (
    SELECT *, 
        ROW_NUMBER() OVER (ORDER BY ID)-ROW_NUMBER() OVER (ORDER BY Customer_ID, ID) AS Gr
    FROM [Test].[dbo].[Netezza]
) data

In the inner query, row-number-by-id is subtracted from row-number-by-customer-than-id to come up with a per-customer-per-transaction grouping number. The outer query then uses this value to assign a transaction ID (via DENSE_RANK()) and transaction correlative (via ROW_NUMBER())

edited May 20 '15 at 19:02

answered May 19 '15 at 22:51

Ben Gribaudo

5,057
1
40
75

@Marcelo, did the above work? If so, I'll remove my text about T-SQL. – Ben Gribaudo May 20 '15 at 15:42
The above work great! I just edited some little details, (it is not so different from oracle or mysql) Thank you very much, you understand perfect what I need. – Marcelo May 20 '15 at 16:17

Differentiate transactions by customer, given sales table (customer_id, article_id)

1 Answers1