0

I have the following table:

COMPANY_NAME | GROUP | COUNTRY | STATUS  
COM1         |   1   |    DE   | DELETED   
COM2         |   1   |    DE   | REMAINING  
COM3         |   1   |    UK   | DELETED  
COM4         |   2   |    ES   | DELETED  
COM5         |   2   |    FR   | DELETED  
COM6         |   3   |    RO   | DELETED  
COM7         |   3   |    BG   | DELETED  
COM8         |   3   |    ES   | REMAINING  
COM9         |   3   |    ES   | DELETED 

I need to get:

COMPANY_NAME | GROUP | COUNTRY | STATUS  
COM3         |   1   |    UK   | DELETED  
COM4         |   2   |    ES   | DELETED  
COM5         |   2   |    FR   | DELETED  
COM6         |   3   |    RO   | DELETED  
COM7         |   3   |    BG   | DELETED

So, I need all entries where the status is DELETED and within each GROUP there is no COMPANY_NAME that has a status of REMAINING for the same country as the DELETED status. I can use a PROC SQL or a DATA step.

What I have tried so far is:

PROC SQL;
CREATE TABLE WORK.OUTPUT AS
SELECT *
FROM WORK.INPUT
WHERE STATUS = 'DELETED' AND COUNTRY NOT IN (SELECT COUNTRY FROM WORK.INPUT WHERE STATUS = 'REMAINING');
QUIT;

but this obviously excludes all REMAINING countries from all GROUPs.

I also tried a data step:

DATA WORK.OUTPUT;
SET WORK.INPUT;
BY GROUP COUNTRY;

IF NOT (STATUS = 'DELETED' AND COUNTRY NOT IN (COUNTRY WHERE STATUS = 'REMAINING')) THEN DELETE; 

RUN;

but the syntax is incorrect because I don't know the proper way of writing this.

sonarclick
  • 73
  • 8
  • should the first record in the target be `COM3` and not `COM1`? – DomPazz May 29 '18 at 16:18
  • Spot on DomPazz. I have changed it. Thanks. – sonarclick May 29 '18 at 16:23
  • 1
    "I can use a PROC SQL or a DATA step" Which one did you try? Please share the code and any error messages. – Amir May 29 '18 at 16:26
  • ^ To add, StackOverflow is not a code-writing service. – Parfait May 29 '18 at 16:29
  • If you don't know where to start, change the Status Codes to numeric codes that can be sorted so the minimum or maximum and only one will be what you want. Then it becomes very easy to solve. – Reeza May 29 '18 at 16:47
  • @Amir and Parfait. You are right. Please excuse my laziness! I have edited the question to include what I have tried. – sonarclick May 30 '18 at 10:57
  • @sonarclick your code shows you're thinking along the right lines, e.g., 'by' processing. Both SQL and data step solutions have been suggested. – Amir May 30 '18 at 11:20

2 Answers2

3

Try this out:

proc sql;
select * from your_table
where status = 'deleted' and 
      catx("_",country,group) not in 
         (select catx("_",country,group) from your_table where status='remaining');
quit;  

Output:

company_name | group | country | status
com3         |   1   |    UK   | deleted
com4         |   2   |    ES   | deleted
com5         |   2   |    FR   | deleted
com6         |   3   |    RO   | deleted
com7         |   3   |    BG   | deleted
G.Arima
  • 1,171
  • 1
  • 6
  • 13
1

Your solutions show you are thinking along the right lines.

One data step solution is:

data want(drop = remain_list);
   length remain_list $ 20;

   do until(last.group);
      set have;
      by group;

      if status = 'REMAINING' and not find(remain_list, country) then
         remain_list = catx(' ', remain_list, country);
   end;

   do until(last.group);
      set have;
      by group;

      if status = 'DELETED' and not find(remain_list, strip(country)) then
         output;
   end;
run;
halfer
  • 19,824
  • 17
  • 99
  • 186
Amir
  • 880
  • 1
  • 6
  • 15
  • Thank you. I've gone with the PROC SQL solution because it is less code. But it's great to see the data step execution as well. – sonarclick May 30 '18 at 11:47