0

I am using gstat package for ordinary kriging and using the walker lake data (data size = 470). I have randomly taken 20 from that data in each trial and calculate the rmse for randomly chosen training dataset from 50-450 dataset. Then I have calculated the average for each dataset. The results are as follows --

trial Index        training points        avg. rmse
--------------------------------------------------------
1                  50                     43.5936
2                  100                    40.3413
3                  150                    34.8842
4                  200                    28.1230
5                  250                    28.3111
6                  300                    30.9915
7                  350                    30.8903
8                  400                    28.3148
9                  450                    28.9578

My questions are:

1) Why the RMSE is wavy. Why doesn't it always decrease while increasing training data?

2) Does that mean, we don't need large dataset for kriging as when the training dataset is 200, the RMSE is the lowest.

Waiting for the reply.

Chandan
  • 764
  • 2
  • 8
  • 21
  • 1
    this might be sampling error; did you try repeating the experiment with new random selections? – Edzer Pebesma Oct 26 '16 at 06:47
  • I have done the experiment for 100 trials and then made an average of all the SEs. I have also tested the selections with QQplot between exhaustive and sampled data. Both are same. – Chandan Oct 26 '16 at 07:14
  • Did you check for "outliers" in the dataset? Perhaps there are bad points that are influencing the predictions and the variogram fit. – Jared Smith Nov 09 '16 at 16:12

0 Answers0