0

I have an imported excel file, DATASET looks like:

    Family       Weight
    1             150
    1             210   
    1             99
    2             230
    2             100
    2             172

I need to find the sum of ranks for each family.
I know that I can do this easily using PROC RANK but this is a HW problem and the only PROC statement I can use is PROC Means. I cannot even use Proc Sort.

The ranking would be as follows (lowest weight receives rank = 1, etc)

99 - Rank = 1
100 - Rank = 2
150 - Rank = 3
172 - Rank = 4
210 - Rank = 5
230 - Rank = 6

Resulting Dataset:

    Family       Sum_Ranking
    1             9
    2             12

Family 1 Sum_Ranking was calculated by (3+5+1)
Family 2 Sum_Ranking was calculated by (6+2+4)

Thank you for assistance.

  • 1
    And what have you tried so far? I think you don't you even need PROC MEANS to do this problem. You can do it in the DATA step alone unless your HW requires that you use PROC MEANS at least once. – Yick Leung Dec 12 '13 at 21:37
  • Actually, doing this via PROC MEANS is a pretty interesting problem. Using the data step exclusively without PROC SORT is fairly hard - you'll have to do some array manipulation and such that is probably outside the bounds of a presumably lower level class like this. – Joe Dec 13 '13 at 15:34

1 Answers1

0

I'm not going to give you code, but some tips.

Specifically, the most interesting part about the instructions is the explicit "not even PROC SORT".

PROC MEANS has a useful side effect, in that it sorts data by the class variables (in the class variable order). So,

PROC SORT data=blah out=blah_w;
by x y;
run;

and

PROC MEANS data=blah;
class x y;
var y;
output out=blah_w n=;
run;

Have almost the identical results. Both produce a dataset sorted by x y, even though PROC MEANS didn't require a sort.

So in this case, you can use PROC MEANS' class statement to produce a dataset that is sorted by weight and family (you must carry over family here even though you don't need it). Then you must use a data step to produce a RANK variable, which is the rank of the current line (use the _FREQ_ column to figure that out in case there are more than one with the same rank in the same family, and think about what to do in case of ties), then another PROC MEANS to summarize by family this time.

Joe
  • 62,789
  • 6
  • 49
  • 67