1

I have s dataset of

x=c(1600L, 1650L, 1675L, 1700L, 1725L, 1775L, 1800L, 1825L, 1850L, 
1875L, 1880L, 1885L, 1900L, 1920L, 1925L, 1930L, 1935L, 1940L, 
1945L, 1950L, 1955L, 1960L, 1965L, 1975L, 1980L, 1985L, 1990L, 
1995L, 2000L, 2005L, 2010L, 2015L, 2020L, 2025L, 2030L, 2035L, 
2040L, 2045L, 2050L, 2055L, 2060L, 2065L, 2070L, 2075L, 2080L, 
2085L, 2090L, 2095L, 2100L, 2105L, 2110L, 2115L, 2120L, 2125L, 
2130L, 2135L, 2140L, 2145L, 2150L, 2155L, 2160L, 2165L, 2170L, 
2175L, 2180L, 2185L, 2190L, 2195L, 2200L, 2225L, 2250L, 2275L, 
2300L, 2325L, 2350L, 2400L)

y= c(0.294529, 0.285516, 0.240616, 0.275107, 0.275033, 0.236293, 
0.240515, 0.229588, 0.20417, 0.20361, 0.203624, 0.204582, 0.195379, 
0.187396, 0.185315, 0.182648, 0.18076, 0.178717, 0.176931, 0.173805, 
0.171352, 0.169856, 0.170566, 0.166413, 0.164074, 0.162457, 0.160333, 
0.158291, 0.156577, 0.154371, 0.152205, 0.150303, 0.148391, 0.146455, 
0.144258, 0.142454, 0.139729, 0.137987, 0.135529, 0.133566, 0.131664, 
0.129607, 0.127761, 0.125352, 0.123669, 0.121388, 0.119598, 0.117541, 
0.11575, 0.113464, 0.111405, 0.109566, 0.107747, 0.105732, 0.104137, 
0.102337, 0.100538, 0.099007, 0.097542, 0.096187, 0.095008, 0.094473, 
0.094044, 0.093378, 0.093201, 0.093218, 0.093572, 0.094112, 0.094962, 
0.102078, 0.111409, 0.120824, 0.128211, 0.137644, 0.144049, 0.16133
)

I am trying to use a spline in R to interpolate a function of y on x and back out some specific points with equal spacing with a range of numbers before and after the boundary of x. So I write:

fineX <- seq(min(x)-500, max(x)+500 , 1)
interp <- spline(x,y , xout= fineX , method = c("natural"))

The interpolation is fine like the image below:

plot(x,y)
lines(interp)

natural interpolation

But the extrapolation with this method is stupid as you can see in the picture below:

plot(fineX, interp$y)

extrapolation

In the interpolation, the function before roughly x=2000 is clearly decreasing but you can see that the extrapolation before x=1600 becomes increasing.

The smooth.spline function gives a better result but it do not let me to choose the xout points I want (or i don't know how to choose!).

What can I do to have a good interpolation (not linear) beyond the boundary of x and have the xout points that I need?

Novic
  • 351
  • 1
  • 11
  • Can you provide some toy data? Paste e.g. `dput(x)` and `dput(y)`. – Anders Ellern Bilgrau Apr 03 '18 at 16:00
  • Have you tried using [`akima`](https://cran.r-project.org/web/packages/akima/index.html)'s `interp` function with `linear = FALSE` and `extrap = TRUE`? – Dan Apr 03 '18 at 16:06
  • @AndersEllernBilgrau I am sorry but I am a beginner and don't understand what is dput(). You mean I share the data file? Assume the data contain a variable x with 100 observations and a variable y with 100 observations. – Novic Apr 03 '18 at 16:12
  • Yes, if you run dput() on your x and y variables, you can easily copy-paste the output into your post (so we can copy it into our R session). Or share in some other way. – Anders Ellern Bilgrau Apr 03 '18 at 16:15
  • @AndersEllernBilgrau I edited the post and added the data points. – Novic Apr 03 '18 at 16:37
  • @Lyngbakr I didn't know anything about 'akima' and tryed it right now. 'aspline' did not give any different answer and 'interp' does not let me to use list as my x and y variable and I do not know how to use it. – Novic Apr 03 '18 at 16:40
  • @AliEbadi Sorry, I misread your post: `akima` is for functions with two variables. – Dan Apr 03 '18 at 16:52

1 Answers1

2

This is as much a statistical question (if not more) as a programming one.

First, by what criteria are you judging what is a good extrapolation?

Second, I do not see why the extrapolation is obviously bad, if you believe your natural interpolation in the x=1600 to x=1700 range? Plotting it as below, it does not seem extremely crazy if you are fairly certain that you have little noise in the data, or if your underlying data-generation have "inertia" in a sense (you provide no context what the data actually is).

fineX <- seq(min(x)-50, max(x)+50 , 1)
interp <- spline(x, y ,xout = fineX , method = "natural")
plot(x,y, xlim = range(fineX), ylim = range(interp$y))
lines(interp)
s <- fineX > max(x) | fineX < min(x)
points(fineX[s], interp$y[s], pch = 3, cex = .7, col = "red")
abline(v = range(x), col = "red")

enter image description here

Setting method = "natural", the function uses natural (cubic) splines, so it will always give you a linear extrapolation outside your data interval; that is the definition of natural splines. By using method="fmm" (which are unrestricted cubic splines), it looks much worse (by whatever eye-balling, heuristic, idiosyncratic measure of my own). Of the standard interpolation methods available in R via spline, the best "fit" is the natural splines as I see it.

Thirdly, why does it have to be interpolation? I think a local regression (such as loess) could provide a well-fitting model, which would probably extrapolate much better. Below I try to do just that, whilst eye-balling to set the span parameter.

low <- loess(y ~ x, span = 0.2, control = loess.control(surface = "direct"))
res <- predict(low, newdata = fineX)
lines(fineX, res, col = "blue", lwd = 3)
points(fineX[s], res[s], col = "green", cex = .6, pch = 3)

enter image description here

As for choosing the span in a more objective way, I guess you can cross-validate and select the one with the best fit by a more objective measure.

Anders Ellern Bilgrau
  • 9,928
  • 1
  • 30
  • 37
  • Thank you for providing this great explanation. The data is for options market. I am trying to extrapolate the implied volatility with strike prices. So the data is highly noisy and having negative extrapolation causes my risk-neutral distribution become negative which is wrong. This answer improved the results a lot but still for some of the maturities I have a negative response. – Novic Apr 03 '18 at 23:22
  • I'm ignorant about that kind of data; so I'm afraid I cannot help you much more. Perhaps if you provide an example where it fails. – Anders Ellern Bilgrau Apr 04 '18 at 06:19