I have a retail transaction data set that looks like:
TRANSID GROSS AMNT TRANSROWTYPE
123 50 Z
123 20 A
123 30 A
126 90 Z
126 20 A
126 30 A
126 20 A
… .. .
Where,
• TRANDIS
is a transaction identifier
• GROSS AMNT
is the gross amount for the transaction row
• TRANSROWTYPE
is the row type for a transaction.
All transactions have 2 types of Rows
- Type
A
: Detailed row(s) of the transaction. A transaction can have 1 or multiple TypeA
rows. - Type
Z
: Header row for the transaction. A transaction can have only 1 TypeZ
row.
Ideally for a TRANSID
, the sum of GROSS AMNT
s in Type A
rows should equal the GROSS AMNT
in a Type Z
row.
In the sample, this is true for TRANSID=123
, but not for TRANSID=126
.
My query is how do I segregate the TRANSIDs
that have
GROSS AMNTs
in Type A
rows= GROSS AMNT
s in Type Z
row from those where the two are not equal?
I am particularly interested to find a solution using dplyr
Many thanks in advance.