3

I am using SCOTT schema in oracle and want to delete the duplicate values like this...

SELECT   EMPNO, JOB, SAL
  FROM   (  SELECT   EMPNO,
                     JOB,
                     SAL,
                     ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
              FROM   EMP_TEST
          ORDER BY   EMPNO)
 WHERE   R_ID = 1

Right now I want to perform this in SAS ... and following code performs fine....

PROC SQL;
connect to oracle (PATH="database_name" USER=SCOTT PASSWORD=tiger); 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM   (  SELECT   EMPNO,
                         JOB,
                         SAL,
                         ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
                  FROM   SCOTT.EMP_TEST
              ORDER BY   EMPNO)
     WHERE   R_ID = 1
);
DISCONNECT FROM oracle;
QUIT;

But I got error when I have to perform this type of query in dataset level.

Suppose my EMP_TEST is not oracle table, its actually SAS dataset in a particular path and I want to create another dataset named EMP_DEDUP from EMP_TEST dataset after remove the duplicate values. This is my approach.

PROC SQL; 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM   (  SELECT   EMPNO,
                         JOB,
                         SAL,
                         ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
                  FROM   path.EMP_TEST
              ORDER BY   EMPNO)
     WHERE   R_ID = 1
;
QUIT;

Here I found syntax error..... might be SAS does not support ROW_NUMBER () function. Any possible solution.. can anybody suggest...

goldenbutter
  • 575
  • 2
  • 12
  • 25
  • I could see opening bracket ...`connection to oracle(` but its closing bracket is missing. Is is correct syntax? – San Jan 24 '14 at 06:06
  • 1
    If path.EMP_TEST is a SAS Dataset then you cannot apply `ROW_NUMBER()` function on it. Why bother with `PROC SQL` to deduplicate your SAS dataset when there is a perfectly fine procedure already available to do exactly this. See `PROC SORT with NODUPKEY option` –  Jan 24 '14 at 09:20
  • You can't actually do it in one step with NODUPKEY, unfortunately. – Joe Jan 24 '14 at 15:50
  • SAS has an undocumented function called monotonic() that returns the (output) observation number. This may be slightly different to the row_number() function in oracle though. – Robert Penridge Jan 28 '14 at 17:38
  • @San .. sorry .. it was a typo... ) braces added ... – goldenbutter Jan 29 '14 at 19:14

2 Answers2

3

Sounds like you want the lowest SAL for each EMPNO, is that correct?

proc sort data=path.emp_test;
by empno sal;
run;

data work.emp_dedupl;
set path.emp_test;
by empno sal;
if first.empno;
run;

SAS doesn't support PARTITION OVER, so it wouldn't really allow you to do this in SQL.

Joe
  • 62,789
  • 6
  • 49
  • 67
1

If analytical functions are not supported then you can write something like this

PROC SQL; 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM  path.EMP_TEST
     WHERE rowid in (select min(rowid) 
                      from path.EMP_TEST
                     group by empno)
);
QUIT;   
San
  • 4,508
  • 1
  • 13
  • 19