Segregate multi-line transactions based on row values

Question

I have a retail transaction data set that looks like:

TRANSID GROSS AMNT  TRANSROWTYPE  
123         50          Z  
123         20          A  
123         30          A  
126         90          Z  
126         20          A  
126         30          A  
126         20          A  
…           ..          .

Where,
• TRANDIS is a transaction identifier
• GROSS AMNT is the gross amount for the transaction row
• TRANSROWTYPE is the row type for a transaction.

All transactions have 2 types of Rows

Type A: Detailed row(s) of the transaction. A transaction can have 1 or multiple Type A rows.
Type Z: Header row for the transaction. A transaction can have only 1 Type Z row.

Ideally for a TRANSID, the sum of GROSS AMNTs in Type A rows should equal the GROSS AMNT in a Type Z row.

In the sample, this is true for TRANSID=123, but not for TRANSID=126.

My query is how do I segregate the TRANSIDs that have
GROSS AMNTs in Type A rows= GROSS AMNTs in Type Z row from those where the two are not equal?

I am particularly interested to find a solution using dplyr

Many thanks in advance.

score 0 · Accepted Answer · answered Mar 19 '20 at 22:31

Yes, you can do it in a single shot with dplyr. First you summarise in order to condense multiple rows of same type. It will only affect A type rows, since Z rows only happen once anyway for each transation id.

Right after summarising you do a pivot_wider() so the values for A and Z can be side by side on the same row, making it very easy to compare and filter only those that suit your criteria.

retail %>%
  group_by(TRANSID, TRANSROWTYPE) %>%
  summarise(TOTAL.AMT = sum(`GROSS AMNT`)) %>%
  ungroup() %>%
  pivot_wider(names_from = "TRANSROWTYPE", values_from = "TOTAL.AMT") %>%
  subset(A != Z)

score 0 · Answer 2 · answered Mar 19 '20 at 22:49

There are many ways of getting the desired result using the tidyverse package, so here is just another option.

library(tidyverse)

df %>%
  #Group by TRANS and TRANSROWTYPE
  group_by(TRANSID, TRANSROWTYPE) %>%
  #Get the sum of GROSS AMT by ID and row type
  summarise(sum_amt = sum(`GROSS AMNT`)) %>%
  #Ungroup
  ungroup() %>%
  #Group by ID
  group_by(TRANSID) %>%
  #Get distinct values
  distinct(sum_amt) %>%
  #Remove rows where sum_amt is the same for A and Z per ID (i.e., n = 1)
  #and stay only with rows where there are 2 distinct values
  filter(n() >= 2)

Thank you Jonathan, The solution works. Also, appreciate the comments to help explain. — Nibbles, Mar 20 '20 at 09:00

Segregate multi-line transactions based on row values

2 Answers2