I am having trouble understanding how to perform LOOCV in SPSS. I need to evaluate a simple linear regression $Y=aX+b$. Thanks.
Asked
Active
Viewed 6,602 times
3
-
1This question appears to be off topic because it's simply about how to perform a statistical procedure in software. – AdamO Jun 13 '14 at 21:40
-
2This question appears to be off-topic because it is about how to perform a statistical procedure in software. – gung - Reinstate Monica Jun 13 '14 at 22:04
1 Answers
1
For linear regression it is pretty easy, and SPSS allows you to save the statistics right within the REGRESSION
command. See here for another example.
REGRESSION
/NOORIGIN
/DEPENDENT Y
/METHOD=ENTER X
/SAVE PRED (PredAll) DFIT (CVFit).
Then the leave one out prediction can be calculated as COMPUTE LeaveOneOut = PredAll - CVFit.
But for non-linear models that SPSS does not provide convenient SAVE
values for one can build the repeated dataset with the missing values, then use SPLIT FILE
, and then obtain the leave one out statistics for whatever statistical procedure you want. If your id variable is simply the row number for the dataset, you simply need two loops of the maximum case number, and then match the needed info into the new file.
Here is an example of this procedure.
*Making some fake data to work with.
INPUT PROGRAM.
LOOP Id = 1 TO 10.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Sim.
COMPUTE X = RV.NORMAL(10,5).
COMPUTE Y = 3 + 0.2*(X) + RV.NORMAL(0,0.2).
FORMATS Id (F2.0) X Y (F4.2).
EXECUTE.
*Original regression model with the leave one.
*out fits.
REGRESSION
/NOORIGIN
/DEPENDENT Y
/METHOD=ENTER X
/SAVE PRED (PredAll) DFIT (CVFit).
*Manual way to create stacked dataset
*can use with other non-linear models.
INPUT PROGRAM.
COMPUTE #Cases = 10.
LOOP #Id = 1 TO #Cases.
LOOP #Iter = 1 TO #Cases.
COMPUTE L1O = #Iter.
COMPUTE Id = #Id.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME LeaveOneOut.
*Merging in original data.
MATCH FILES FILE = *
/TABLE = 'Sim'
/BY Id.
*Set missing to
IF L1O = Id Y = $SYSMIS.
SORT CASES BY L1O.
SPLIT FILE BY L1O.
*You can replace regression with whatever procedure you are.
*interested in.
REGRESSION
/NOORIGIN
/DEPENDENT Y
/METHOD=ENTER X
/SAVE PRED (CVFit2).
SPLIT FILE OFF.
*This shows the original leave one out stats.
*And new stats are the same besides some floating.
*point differences.
COMPUTE Test = (CVFit2 - (PredAll-CVFit)).
TEMPORARY.
SELECT IF (L1O = Id).
FREQ VAR Test.
EXECUTE.