I am using gstat package for ordinary kriging and using the walker lake data (data size = 470). I have randomly taken 20 from that data in each trial and calculate the rmse for randomly chosen training dataset from 50-450 dataset. Then I have calculated the average for each dataset. The results are as follows --
trial Index training points avg. rmse
--------------------------------------------------------
1 50 43.5936
2 100 40.3413
3 150 34.8842
4 200 28.1230
5 250 28.3111
6 300 30.9915
7 350 30.8903
8 400 28.3148
9 450 28.9578
My questions are:
1) Why the RMSE is wavy. Why doesn't it always decrease while increasing training data?
2) Does that mean, we don't need large dataset for kriging as when the training dataset is 200, the RMSE is the lowest.
Waiting for the reply.