1

I have two tables table_A and table_B.

table_A

Queue_nm Cust_nm
ABCD VERI
PQRS RSTQ

table_B

Queue_nm Act_Cust_nm
PQRS TMOB

I want to compare the two datasets and if they don't match, then replace the data in table_B in table_A using SAS

data compare_test_2;
set table_A;
set table_B;

if table_A.Queue_nm = table_B.Queue_nm
and tableA.Cust_nm != table_B.Act_Cust_name
then do;
tableA.Cust_nm = table_B.Act_Cust_name
where table_A.Queue_nm = table_B.Queue_nm;
run;

I want the following output after the comparison and the data step

table_A

Queue_nm Cust_nm
ABCD VERI
PQRS TMOB

I get the following error

ERROR: DATA STEP Component Object failure.  Aborted during the COMPILATION phase.
ERROR 557-185: Variable table_A is not an object.
Rogue258
  • 109
  • 1
  • 9
  • Variable names cannot have periods in them. What do you mean by TABLE_A.QUEUE_NM? If there is a variable named QUEUE_NM in TABLE_A then that is the name the variable will have in the data step. If there is also a variable named QUEUE_NM in TABLE_B then the value read from TABLE_B will have already overwritten the value read from TABLE_A by the time control gets to the IF statement. – Tom Jun 16 '22 at 02:37
  • Does QUEUE_NM uniquely identify the observations in both datasets? – Tom Jun 16 '22 at 02:50
  • It really looks like the goal is not to compare values but to update or replace values. – Tom Jun 16 '22 at 15:11

3 Answers3

1

You have not written valid SAS data step code. Your code is more SQL-like. The corrected code is below, but it can get even simpler than this.

data want;
    merge table_a(in=a)
          table_b(in=b)
    ;
    by queue_nm;

    if(a = b AND cust_nm NE act_cust_nm) then cust_nm = act_cust_nm;

    drop act_cust_nm;
run;

In the data step, the in= option is equivalent to a.queue_nm in SQL since we are merging by queue_nm. Think of it like an implied version of that.

We can simplify this even further by renaming act_cust_nm to cust_nm in table_b and overwriting any instance of cust_nm in table_a with the value from table_b. Since table_b is second in the merge statement, it will overwrite matching values of queue_nm in table_a.

data want;
    merge table_a
          table_b(rename=(act_cust_nm = cust_nm) )
    ;
    by queue_nm;
run;

The rules of how this works gets a little more complex when you have a one-to-many or many-to-many merge. I highly recommend reading more about how that works:

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/n1i8w2bwu1fn5kn1gpxj18xttbb0.htm

Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
1

Just merge. Make sure that both variables use the same name in both datasets. If you need to you can use dataset option to rename variables before they make it into the data step.

 data want;
   merge tablea tableb(rename=(Act_Cust_nm=Cust_nm));
   by queue_nm;
 run;

If you do not want to keep observations that only came from TABLEB then use the IN= dataset option to create a variable that will indicate if TABLEA is contributing to the observation.

 data want;
   merge tablea(in=ina) tableb;
   by queue_nm;
   if ina;
 run;

If you have multiple observations per QUEUE_NM in either dataset then using MERGE will not work.

Tom
  • 47,574
  • 2
  • 16
  • 29
  • Yes, table_B could have multiple entries per queue_nm. For e.g. in future, PQRS queue_nm could have observations like 'TMOB' and 'CRIC', both associated with it. I would want the latest observation of PQRS queue_nm associated with it. How can I get the PQRS value in table_A to replace with latest observation in Table_B. – Rogue258 Jun 16 '22 at 16:27
  • So you want to apply TRANSACTIONS instead of a simple MERGE. Use the UPDATE statement instead of the MERGE statement. – Tom Jun 16 '22 at 18:08
1

You can use SQL to update a table with values from a second table via a correlated subquery using a key variable.

Example:

proc sql;
  update table_a as outer
  set cust_nm = (select act_cust_nm from table_b as inner where inner.Queue_nm = outer.Queue_nm)
  where exists (select * from table_b as inner where inner.Queue_nm = outer.Queue_nm)
  ;

Another way to update a table in place is the MODIFY statement.

proc datasets nolist lib=work;
  modify table_a;
  create index Queue_nm;
quit;

data table_a;
  set table_b;
  modify table_a key=Queue_nm;
  cust_num = act_cust_num;
run;

Richard
  • 25,390
  • 3
  • 25
  • 38