-1

I have a dataset where observations is student and then I have a variable for their test score. I need to standardize these scores like this :

newscore = (oldscore - mean of all scores) / std of all scores

So that I am thinking is using a Data Step where I create a new dataset with the 'newscore' added to each student. But I don't know how to calculate the mean and std of the entire dataset IN in the Data Step. I know I can just calculate it using proc means, and then manually type it it. But I need to do I a lot of times and maybe drop variables and other stuff. So I would like to be able to just calculate it in the same step.

Data example:

__VAR testscore newscore
Student1 5 x
Student2 8 x
Student3 5 x

Code I tried:

data new;
set old;
newscore=(oldscore-(mean of testscore))/(std of testscore)
run;

(Can't post any of the real data, can't remove it from the server)

How do I do this?

SecretAgentMan
  • 2,856
  • 7
  • 21
  • 41
Storm
  • 1
  • 1
  • 4

3 Answers3

3

Method1: Efficient way of solving this problem is by using proc stdize . It will do the trick and you dont need to calculate mean and standard deviation for this.

data have;
input var $ testscore;
cards;
student1 5
student2 8
student3 5
;
run;

data have;
set have;
newscore = testscore;
run;

proc stdize data=have out=want;
   var newscore;
run;   

Method2: As you suggested taking out means and standard deviation from proc means, storing their value in a macro and using them in our calculation.

proc means data=have;
var testscore;
output out=have1 mean = m stddev=s;
run;

data _null_;
set have1;
call symputx("mean",m);
call symputx("std",s);
run;

data want;
set have;
newscore=(testscore-&mean.)/&std.;
run;

My output:

var           testscore  newscore
student1      5          -0.577350269   
student2      8          1.1547005384   
student3      5          -0.577350269

Let me know in case of any queries.

G.Arima
  • 1,171
  • 1
  • 6
  • 13
  • Hey, thansk a lot ! Proc stdize did the job reallly easy. And thanks for the other method aswell, i dident know you could use proc means to store output and then use it again. Though i just made a table. Thansk a lot ! – Storm May 04 '17 at 21:41
  • True. But you also need to know the way to store the values and further use them without manually entering them (in this case means and standard deviation), therefore i also told the other way. You are welcome. Happy to help :) – G.Arima May 04 '17 at 21:43
  • Arh ofcourse, done, im very new here on stackoverflow :) – Storm May 04 '17 at 21:48
  • Why would you use macro variables to do the transfer from PROC MEANS to data step, instead of how I show it (directly with the datasets)? – Joe May 04 '17 at 21:59
  • I didn't know that. I have never used a set in if statement Thanks for the info though. Appreciated. – G.Arima May 04 '17 at 22:03
1

You should not try to do this in the data step. Do it with proc means. You don't need to type anything in, just grab the value in a dataset.

You don't provide enough to give complete code in the answer, but the basic idea.

proc means data=sashelp.class;
var height weight;
output out=class_stats mean= std= /autoname;
run;

data class;
  if _n_=1 then set class_Stats;  *copy in the values from class_Stats;
  set sashelp.class;
  height_norm = (height-height_mean)/(height_stddev);
  weight_norm = (weight-weight_mean)/(weight_stddev);

run;

Alternately, just use PROC STDIZE which will do this for you.

proc stdize data=sashelp.class out=class_Std;
   var height weight;
run;
Joe
  • 62,789
  • 6
  • 49
  • 67
  • Thanks a lot for the answer! I did not know proc stdize, it did the job like a charm! Thanks again – Storm May 04 '17 at 21:31
0

If you want to achieve this via proc sql:

proc sql; 
create table want as
select *, mean(oldscore) as mean ,std(oldscore) as sd
from have; 
quit; 

For other statistical functions in proc sql, see here: https://support.sas.com/kb/25/279.html

A.Fischer
  • 596
  • 5
  • 11