2

In SAS, suppose I have a dataset named "person_groups". It has two variables, named "person" and "group". This dataset simply assigns each person to a group.

How can I remove from this dataset all persons who have no one else in their group? In other words, how can I remove all singleton groups?

[I'd be happy with a proc sql solution or a data step solution--either is fine.]

Side note: I'm new to SAS. I have been using C++ and MATLAB for many years. I feel like I can't understand how to do anything with the SAS DATA step. It seems extremely clunky, bizarre, and inelegant. Frankly, I'm growing very frustrated. Anyone out there have hope for the weary? :)

synaptik
  • 8,971
  • 16
  • 71
  • 98

2 Answers2

4

Here's a way that uses a data step. This method requires a sort.

data person_groups;
 input person $ group $;
 datalines;
John Grp1
Mary Grp3
Joe Grp2
Jane Grp3
Frank Grp1
;

Proc Sort data=person_groups;
 by group;
run;

Data person_groups;
 set person_groups;
 by group;
 if first.group and last.group then delete;
run;
Jay Corbett
  • 28,091
  • 21
  • 57
  • 74
  • Thanks. Using "by" statements and "delete" statements in data steps seems very useful. Knowing them, I think I can do much more now. – synaptik Nov 04 '12 at 14:54
2

Here is a PROC SQL solution:

proc sql;
   delete from person_groups
   where group in (
      select group
      from person_groups
      group by 1
      having count(*) = 1
      );
quit;

As you can see, PROC SQL mostly follows standard ANSI syntax, so your existing SQL skills should be quite portable. And hang in there with respect to SAS Data Step programming; the language is extremely rich, full featured, and by no means "inelegant". You may have seen inelegant code, but that's more likely the fault of the programmer. Take a few minutes and read the SAS Concepts manual.

BellevueBob
  • 9,498
  • 5
  • 29
  • 56
  • Thanks a lot. Maybe I should try to rely on proc sql more than SAS data step, at least until I learn more about data step programming. Can proc sql do a very large percentage of the things that data step can do? Thanks, I will definitely read that manual. – synaptik Nov 04 '12 at 16:49
  • One handy aspect of the data step is that when you are doing a MERGE and you identify your data sets with an IN= option, you are able to determine the merge status of each record. You get this with one pass in a data step, but I believe it would take multiple passes with PROC SQL. Although, each data set must be sorted prior to the data step, so it may end up being a wash efficiency wise. – Jay Corbett Nov 04 '12 at 20:02
  • I definitely recommend learning data step techniques if you're going to use SAS. I imagine most things can be done in either, but there are definitely things that you'd rather do in the data step - like this one, presumably; I would expect the data step solution to be faster. – Joe Nov 04 '12 at 20:13