Compare data from different data sets SAS

Question

I have two tables table_A and table_B.

table_A

Queue_nm	Cust_nm
ABCD	VERI
PQRS	RSTQ

table_B

Queue_nm	Act_Cust_nm
PQRS	TMOB

I want to compare the two datasets and if they don't match, then replace the data in table_B in table_A using SAS

data compare_test_2;
set table_A;
set table_B;

if table_A.Queue_nm = table_B.Queue_nm
and tableA.Cust_nm != table_B.Act_Cust_name
then do;
tableA.Cust_nm = table_B.Act_Cust_name
where table_A.Queue_nm = table_B.Queue_nm;
run;

I want the following output after the comparison and the data step

table_A

Queue_nm	Cust_nm
ABCD	VERI
PQRS	TMOB

I get the following error

ERROR: DATA STEP Component Object failure.  Aborted during the COMPILATION phase.
ERROR 557-185: Variable table_A is not an object.

Variable names cannot have periods in them. What do you mean by TABLE_A.QUEUE_NM? If there is a variable named QUEUE_NM in TABLE_A then that is the name the variable will have in the data step. If there is also a variable named QUEUE_NM in TABLE_B then the value read from TABLE_B will have already overwritten the value read from TABLE_A by the time control gets to the IF statement. — Tom, Jun 16 '22 at 02:37
Does QUEUE_NM uniquely identify the observations in both datasets? — Tom, Jun 16 '22 at 02:50
It really looks like the goal is not to compare values but to update or replace values. — Tom, Jun 16 '22 at 15:11

Stu Sztukowski · Answer 1 · 2022-06-15T22:36:45.203

You have not written valid SAS data step code. Your code is more SQL-like. The corrected code is below, but it can get even simpler than this.

data want;
    merge table_a(in=a)
          table_b(in=b)
    ;
    by queue_nm;

    if(a = b AND cust_nm NE act_cust_nm) then cust_nm = act_cust_nm;

    drop act_cust_nm;
run;

In the data step, the in= option is equivalent to a.queue_nm in SQL since we are merging by queue_nm. Think of it like an implied version of that.

We can simplify this even further by renaming act_cust_nm to cust_nm in table_b and overwriting any instance of cust_nm in table_a with the value from table_b. Since table_b is second in the merge statement, it will overwrite matching values of queue_nm in table_a.

data want;
    merge table_a
          table_b(rename=(act_cust_nm = cust_nm) )
    ;
    by queue_nm;
run;

The rules of how this works gets a little more complex when you have a one-to-many or many-to-many merge. I highly recommend reading more about how that works:

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/n1i8w2bwu1fn5kn1gpxj18xttbb0.htm

score 1 · Answer 2 · answered Jun 16 '22 at 02:49

1

Just merge. Make sure that both variables use the same name in both datasets. If you need to you can use dataset option to rename variables before they make it into the data step.

 data want;
   merge tablea tableb(rename=(Act_Cust_nm=Cust_nm));
   by queue_nm;
 run;

If you do not want to keep observations that only came from TABLEB then use the IN= dataset option to create a variable that will indicate if TABLEA is contributing to the observation.

 data want;
   merge tablea(in=ina) tableb;
   by queue_nm;
   if ina;
 run;

If you have multiple observations per QUEUE_NM in either dataset then using MERGE will not work.

answered Jun 16 '22 at 02:49

Tom

47,574
2
16
29

Yes, table_B could have multiple entries per queue_nm. For e.g. in future, PQRS queue_nm could have observations like 'TMOB' and 'CRIC', both associated with it. I would want the latest observation of PQRS queue_nm associated with it. How can I get the PQRS value in table_A to replace with latest observation in Table_B. – Rogue258 Jun 16 '22 at 16:27
So you want to apply TRANSACTIONS instead of a simple MERGE. Use the UPDATE statement instead of the MERGE statement. – Tom Jun 16 '22 at 18:08

score 1 · Accepted Answer · answered Jun 16 '22 at 04:10

You can use SQL to update a table with values from a second table via a correlated subquery using a key variable.

Example:

proc sql;
  update table_a as outer
  set cust_nm = (select act_cust_nm from table_b as inner where inner.Queue_nm = outer.Queue_nm)
  where exists (select * from table_b as inner where inner.Queue_nm = outer.Queue_nm)
  ;

Another way to update a table in place is the MODIFY statement.

proc datasets nolist lib=work;
  modify table_a;
  create index Queue_nm;
quit;

data table_a;
  set table_b;
  modify table_a key=Queue_nm;
  cust_num = act_cust_num;
run;

Compare data from different data sets SAS

3 Answers3