1

To calculate the mean of variable "value" with weight, we can simply do the following:

proc sql;
select sum(Value*Weight)/sum(Weight) as WeightedAverage
from table ;
quit; 

How can we calculate median of the variable "value" with weight? I know we can use proc means, but I would prefer a solution in proc sql.

Example:

data table;
input value weight;
datalines;
1 1
2 1
3 2
;
run;

Using proc means, we can get the mean (2.25) and median (2.5) easily:

proc means data=table mean median; 
var value;
weight weight;
run;

Using above proc sql, I can get the weighted average: 2.25.

How to get the median (which is 2.5)?

user2759430
  • 61
  • 2
  • 3
  • 9
  • Is your SAS version 9.4 and above? Any reason to not use `PROC SURVEYMEANS` or `MEANS`? – Reeza May 02 '18 at 20:30
  • Thanks. Yes I am running SAS EG. Proc means works, but I would like to know if there is also a simple way to do it in proc sql.... – user2759430 May 02 '18 at 20:36
  • Can you clarify your SAS version, as Reeza asked originally? Not your EG/SAS/etc. but the version number (9.3, 9.4, etc.)? – Joe May 02 '18 at 21:09
  • The SAS version is very important to this question because before SAS 9.4 PROC SQL could not calculate a median. You can use the following to get your version: `proc product_status;run;` – Reeza May 02 '18 at 21:24
  • I have SAS 9.4 and can do median() in proc sql. However I am not sure how to take into account weight. The weight variable is just the frequency. – user2759430 May 03 '18 at 01:20

1 Answers1

3

There's also a median() function in proc sql (available in SAS 9.4).

A normal median would be like this:

proc sql; select median(Value) as Median from table ; quit;

A weighted Median would be something like this depending on how you want to use the weights:

proc sql; select median(Value*Weight) as WeightedMedian from table ; quit;

Example: The Median will be 2.5

data have;
input value;
datalines;
1 
2 
3 
3
;
run;
proc sql;
create table want as 
select Median(value) as med
from have;
quit;

Example: Segregate the data, so we have 4 row 1,2,3,3 instead of 3

data have;
input value weight;
datalines;
1 1
2 1
3 2
;
run;

data have_seg;
set have;
retain freq;
freq= weight;
if(freq=1) then do; output; end;
else if freq > 1 then do; 
do i=1 to freq; weight=1; output; end;
end;
keep value weight;
run;
momo1644
  • 1,769
  • 9
  • 25
  • Thanks. However the weighted median you suggested does not seem to be what I want. The weight variable is in fact the frequency of the of "value". – user2759430 May 03 '18 at 01:23
  • You are welcome. we will do out best to help. Please add to your question the calculation that you want to do so we can give you the working code for it. – momo1644 May 03 '18 at 09:02
  • Just a thought, you can use `median(value) * median(weight)` – momo1644 May 03 '18 at 09:04
  • Really appreciate your help. Suppose I have three values 1, 2, 3. If I have corresponding weights as 1, 1, 1. The median should be 2 (which is the number in the middle). However, if I have weights 1, 1, 2, the median should be 2.5 (the median of 1, 2, 3, 3). – user2759430 May 03 '18 at 13:38
  • In that case you don't need the weight, just calculate the median on the segregated data. I will add an example of what I mean in the answer. – momo1644 May 03 '18 at 13:47
  • Is there a way to calculate the median without changing the original data? If I use proc means, I can simple do this proc means data=table mean median; var value; weight weight; run; – user2759430 May 03 '18 at 13:55
  • Yes, In the code above your source table (Have) is untouched; a new table (want) is created with 1 row - Median value – momo1644 May 03 '18 at 13:57
  • just run this and add your source table name: `proc sql; create table want as select Median(value) as med from your_table ; quit;` – momo1644 May 03 '18 at 13:58
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/170295/discussion-between-user2759430-and-momo1644). – user2759430 May 03 '18 at 14:02