Remove duplicate in SAS similar like ROW_NUMBER () function in Oracle

Question

I am using SCOTT schema in oracle and want to delete the duplicate values like this...

SELECT   EMPNO, JOB, SAL
  FROM   (  SELECT   EMPNO,
                     JOB,
                     SAL,
                     ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
              FROM   EMP_TEST
          ORDER BY   EMPNO)
 WHERE   R_ID = 1

Right now I want to perform this in SAS ... and following code performs fine....

PROC SQL;
connect to oracle (PATH="database_name" USER=SCOTT PASSWORD=tiger); 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM   (  SELECT   EMPNO,
                         JOB,
                         SAL,
                         ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
                  FROM   SCOTT.EMP_TEST
              ORDER BY   EMPNO)
     WHERE   R_ID = 1
);
DISCONNECT FROM oracle;
QUIT;

But I got error when I have to perform this type of query in dataset level.

Suppose my EMP_TEST is not oracle table, its actually SAS dataset in a particular path and I want to create another dataset named EMP_DEDUP from EMP_TEST dataset after remove the duplicate values. This is my approach.

PROC SQL; 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM   (  SELECT   EMPNO,
                         JOB,
                         SAL,
                         ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
                  FROM   path.EMP_TEST
              ORDER BY   EMPNO)
     WHERE   R_ID = 1
;
QUIT;

Here I found syntax error..... might be SAS does not support ROW_NUMBER () function. Any possible solution.. can anybody suggest...

I could see opening bracket ...`connection to oracle(` but its closing bracket is missing. Is is correct syntax? — San, Jan 24 '14 at 06:06
If path.EMP_TEST is a SAS Dataset then you cannot apply `ROW_NUMBER()` function on it. Why bother with `PROC SQL` to deduplicate your SAS dataset when there is a perfectly fine procedure already available to do exactly this. See `PROC SORT with NODUPKEY option` — , Jan 24 '14 at 09:20
You can't actually do it in one step with NODUPKEY, unfortunately. — Joe, Jan 24 '14 at 15:50
SAS has an undocumented function called monotonic() that returns the (output) observation number. This may be slightly different to the row_number() function in oracle though. — Robert Penridge, Jan 28 '14 at 17:38

score 3 · Accepted Answer · answered Jan 24 '14 at 15:48

3

Sounds like you want the lowest SAL for each EMPNO, is that correct?

proc sort data=path.emp_test;
by empno sal;
run;

data work.emp_dedupl;
set path.emp_test;
by empno sal;
if first.empno;
run;

SAS doesn't support PARTITION OVER, so it wouldn't really allow you to do this in SQL.

answered Jan 24 '14 at 15:48

Joe

62,789
6
49
67

i have a question... is this code will give the same output which I have done in oracle??? tomorrow I'll test it. – goldenbutter Jan 29 '14 at 19:12

score 1 · Answer 2 · answered Jan 24 '14 at 06:08

If analytical functions are not supported then you can write something like this

PROC SQL; 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM  path.EMP_TEST
     WHERE rowid in (select min(rowid) 
                      from path.EMP_TEST
                     group by empno)
);
QUIT;

Remove duplicate in SAS similar like ROW_NUMBER () function in Oracle

2 Answers2