0

I have a retail transaction data set that looks like:

TRANSID GROSS AMNT  TRANSROWTYPE  
123         50          Z  
123         20          A  
123         30          A  
126         90          Z  
126         20          A  
126         30          A  
126         20          A  
…           ..          .  

Where,
TRANDIS is a transaction identifier
GROSS AMNT is the gross amount for the transaction row
TRANSROWTYPE is the row type for a transaction.

All transactions have 2 types of Rows

  1. Type A: Detailed row(s) of the transaction. A transaction can have 1 or multiple Type A rows.
  2. Type Z: Header row for the transaction. A transaction can have only 1 Type Z row.

Ideally for a TRANSID, the sum of GROSS AMNTs in Type A rows should equal the GROSS AMNT in a Type Z row.

In the sample, this is true for TRANSID=123, but not for TRANSID=126.

My query is how do I segregate the TRANSIDs that have
GROSS AMNTs in Type A rows= GROSS AMNTs in Type Z row from those where the two are not equal?

I am particularly interested to find a solution using dplyr

Many thanks in advance.

Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33
Nibbles
  • 19
  • 6

2 Answers2

0

Yes, you can do it in a single shot with dplyr. First you summarise in order to condense multiple rows of same type. It will only affect A type rows, since Z rows only happen once anyway for each transation id.

Right after summarising you do a pivot_wider() so the values for A and Z can be side by side on the same row, making it very easy to compare and filter only those that suit your criteria.

retail %>%
  group_by(TRANSID, TRANSROWTYPE) %>%
  summarise(TOTAL.AMT = sum(`GROSS AMNT`)) %>%
  ungroup() %>%
  pivot_wider(names_from = "TRANSROWTYPE", values_from = "TOTAL.AMT") %>%
  subset(A != Z)
user2332849
  • 1,421
  • 1
  • 9
  • 12
0

There are many ways of getting the desired result using the tidyverse package, so here is just another option.

library(tidyverse)

df %>%
  #Group by TRANS and TRANSROWTYPE
  group_by(TRANSID, TRANSROWTYPE) %>%
  #Get the sum of GROSS AMT by ID and row type
  summarise(sum_amt = sum(`GROSS AMNT`)) %>%
  #Ungroup
  ungroup() %>%
  #Group by ID
  group_by(TRANSID) %>%
  #Get distinct values
  distinct(sum_amt) %>%
  #Remove rows where sum_amt is the same for A and Z per ID (i.e., n = 1)
  #and stay only with rows where there are 2 distinct values
  filter(n() >= 2)