How can I find the mismatch between two similar PROD vs SIT datasets on a summary level and dive deeper into the row for inconsistencies? Alteryx?

Question

I would like to automate this processes for multiple datasets. I am not sure if there exists a tool like Alteryx or any software where I can do the following.

Compare 2 similar datasets on a block aggregate level.
If mismatch exists, zoom into that block to find where the inconsistencies occur.
Get rows of data with ID where inconsistency happens. The goal is to explore on a high level because of the huge amount of data and open a more comprenhesive view of the inconsistency to point out the error easily.

Here is the data that exists in both SIT and PROD:

ID	Block	Amt A	Amt B	Amt C	....
1	X	1000	400	1000	....
2	Y	2000	500	4000	....
3	Z	3000	500	3000	....
2	Y	2000	3000	2000	....
3	Z	3000	4500	3000	....

This is my PROD data after aggregating on 'block'

Block	Counts	Sum Amt A	Sum Amt B
X	1000	40000	10000
Y	2000	45000	20000
Z	3000	50000	30000

This is my SIT data after aggregating on 'block'

Block	Counts	Sum Amt A	Sum Amt B
X	1000	40000	15000
Y	2000	47000	20000
Z	3000	50000	30000

FYI - There are multiple columns like block that could be used on aggregation. One could choose any column, get an aggregation to find inconsistencies and then explore into that particular block.

score 1 · Answer 1 · answered Jul 26 '23 at 03:23

It's a typical full join, then filter process, you can do it in SQL, Python dataframe or Alteryx, pretty much anything

`
WITH
sum_prod AS
(
 SELECT
  SUM(AMT_A) SUM_AMT_A,
  COUNT(1),..
 FROM prod
 GROUP BY ...
)
,
sum_SIT AS
(
 SELECT
  SUM(AMT_A) SUM_AMT_A,
  COUNT(1),..
 FROM sit
 GROUP BY ...
)

SELECT 
*
FROM sum_prod
FULL JOIN
sum_sit
ON
sum_prod.block = sum_sit.block
WHERE
 sum_prod.Sum_Amt_A <> sum_sit.Sum_Amt_A
 OR sum_prod.Sum_Amt_A IS NULL
 OR sum_sit.Sum_Amt_A IS NULL  -- always take care of one side missing
 --add however_your_condition_to_tell_diff

`

Same principle in Alteryx, just a join and a selection filter will do

Thanks! How can I check for non-numeric values? I want to compare everything. TIA — sais, Aug 09 '23 at 17:12

How can I find the mismatch between two similar PROD vs SIT datasets on a summary level and dive deeper into the row for inconsistencies? Alteryx?

1 Answers1