3

I have around 300 variables and I am calculating their Skewness and Kurtosis. Now, I want to create a new varaible which will consist of the list of all those variables whose Skewness and Kurtosis are within a certain range. The idea is to select only those variables which are satisfying a condition and perform normalization on all the other variables.

To calcualte Skewness i am using;

Descriptives A TO Z
/Statistics Skewness.
Execute.

I know this is not a valid Syntax but i Need something like this:

Compute x= if(Skewness(A TO Z)>1)

Please help me out with an SPSS Syntax for this.

eli-k
  • 10,898
  • 11
  • 40
  • 44
user10579790
  • 333
  • 1
  • 10
  • Are you sure you need this list in a new variable? What do you do with it once it's there? I'd think the code you are looking for should make a list of the variables that need normalization and then do it. Or go through the variables and for each one decide if normalization is needed and (if yes) do it . – eli-k Jan 15 '19 at 14:13
  • Yes, exactly. The code should be able to find which variables need normalization and perform it on those. – user10579790 Jan 15 '19 at 15:18

1 Answers1

1

There are multiple ways to approach this, so there might be an easier way.

you just need to change the 'var1 TO varN' to your list of variables and whatever criteria you want for Skewness & Kurtosis on the two COMPUTE lines that create the flags, and this will do it for you.

If I were doing this I would go a step further and build the normalization into the syntax using WRITE OUT = ".sps" /CMD. INSERT FILE = ".sps", but that isn't what you asked for.

DATASET DECLARE DistributionSyntax.
OMS
  /SELECT TABLES
  /IF SUBTYPES=["Descriptives"] INSTANCES=[1]
  /DESTINATION FORMAT=SAV OUTFILE = 'DistributionSyntax'.
EXAMINE VARIABLES=var1 TO varN
  /PLOT NONE
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING PAIRWISE
  /NOTOTAL.
OMSEND.
DATASET ACTIVATE DistributionSyntax.

USE ALL.
FILTER OFF.
SELECT IF ANY(Var2,'Skewness','Kurtosis').
EXECUTE.
STRING VarName (A64).
COMPUTE SkewnessFlag = (Var2 = 'Skewness' AND ABS(Statistic) > 2).
COMPUTE KurtosisFlag = (Var2 = 'Kurtosis' AND ABS(Statistic) > 2).
COMPUTE VarName = CHAR.SUBSTR(Var1,1,CHAR.INDEX(Var1,' ')-1).
EXECUTE.

USE ALL.
COMPUTE filter_$=(SkewnessFlag = 1).
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
FRE VarName.

USE ALL.
COMPUTE filter_$=(KurtosisFlag= 1).
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
FRE VarName.

USE ALL. 
FILTER OFF. 
EXECUTE.

If you omit the select data blocks after you compute the flags and replace it with this, it will calculate normalized versions of the variables that meet your criteria. This calculates new variables, and you will want to add a file location for the syntax file (replace the "~/" in the WRITE and INSERT commands), and change the name of the dataset referenced as 'RAWDATA' to whatever your dataset name is:

USE ALL.
FILTER OFF.
SELECT IF ANY(1,SkewnessFlag,KurtosisFlag).
EXECUTE.

STRING CMD (A250).
COMPUTE CMD = CONCAT("COMPUTE ",RTRIM(VarName),".Norm = ln(",RTRIM(VarName),").").
EXECUTE.

DATA LIST /CMD 1-250 (A).
BEGIN DATA
EXECUTE.
END DATA.
DATASET NAME EXE WINDOW = FRONT.

DATASET ACTIVATE DistributionSyntax.
ADD FILES /FILE = *
/FILE = 'EXE'.
EXECUTE.
DATASET CLOSE EXE.
DATASET ACTIVATE DistributionSyntax.

WRITE OUT="~\Normalize Variables.sps" /CMD. 
DATASET CLOSE DistributionSyntax.
DATASET ACTIVATE RAWDATA.
INSERT FILE="~\Normalize Variables.sps".
JYurkovich
  • 127
  • 8
  • Thankyou so much for this code. It was really helpful. I am also trying to add the normalization Syntax as suggested by you :) – user10579790 Jan 16 '19 at 10:08
  • 1
    Not a problem. I went ahead and added a block that computes the normalized variables. If you want to apply a different procedure based on skewness or kurtosis, then you would use a 'DO IF' block that checks which criteria it's flagged for around the COMPUTE CMD command. – JYurkovich Jan 16 '19 at 16:14
  • Hi, Need a clarification. My base data file has around 300 variables and are in the column format. The DistributionSyntax file has each variable mentioned 2 times, once with skewness and once with kurtosis in rows (one under the other). The Variables are Statsitics, Std. Err, Skewness, Kurtosis, etc). As the final Output after normalization, I would like the data to come in the base data Format (each variable in a column, if it has been normalised then the orginal data replaced by the normalised data). For this to happen, should we use OMS command or is there some easier way. – user10579790 Jan 22 '19 at 13:16
  • so you want a dataset with the variable names of only those that need transformed? What values would those variables contain? – JYurkovich Jan 23 '19 at 22:46
  • Yes, the dataset must have the variable names from the DistributionSyntax file and import the cases from the base dataset. The whole objective is selecting variables in the base dataset based on some condition and normalizing the data for further process. Also, if it helps, all the variables are arranged in columns and are time series. – user10579790 Jan 24 '19 at 08:23