Why do I get disjoint data when I try extrapolate after using polynomial regression

Question

I wanted to extrapolate some of the data I had, as shown in the plot below. The blue line is the original data and the red line is the extrapolation that I wanted.

To use regression analysis, I used the function polyfit:

sizespecial = size(i_C); 
endgoal = sizespecial(2); 
plothelp = 1:endgoal;

reg1 = polyfit(plothelp,i_C,2);
reg2 = polyfit(plothelp,i_D,2);

Where i_C and i_D are the vectors that represent the original data. I extended the data by using this code:

plothelp=1:endgoal+11; 

for in = endgoal+1:endgoal+11
    i_C(in) = (reg1(1)*(in^2))+(reg1(2)*in)+reg1(3);
    i_D(in) = (reg2(1)*(in^2))+(reg2(2)*in)+reg2(3);
end

However, the graph I output now is:

I do not understand why the extra notch is introduced (circled in red). Do not hesitate to ask me to clarify any of the details on this questions and thank you for all your answers.

It's hard to tell without an example input for `i_C` and `i_D`. Also note that you can use `polyval` instead of the `for` loop. — Itamar Katz, Feb 22 '16 at 08:20
I suggest plotting the fitted polynomial over the real values to see if the fit is what you expect it to be. It looks like it fits a second degree polynominal (third argument of `polyfit`). If you look at the data, the appended values seem to make sense for a second degree polynomial. As @ItamarKatz said already, better use [`polyval`](http://www.mathworks.com/help/matlab/ref/polyval.html) to evaluate. — Matt, Feb 22 '16 at 08:28
Your data probably don't fit a quadratic very well (you'll see this is you take Matt's suggestion and plot the full regressed polynomial over your original data). You can thus either increase the order of the polynomial or consider an alternative like a cubic spline interpolation with extrapolation (http://www.mathworks.com/help/matlab/ref/interp1.html). Otherwise, it depends on your problem domain) but it might make sense to rather do a linear regression over the last `x` data points in your series. You can tune `x` as a parameter. — Dan, Feb 22 '16 at 08:51

Dan · Accepted Answer · 2016-02-22T09:33:31.497

What I imagine is happening is that you are trying fit a second order polynomial over all your data. My guess is that this polynomial will look a lot like the curve I have drawn in in orange. If you follow Matt's advise from his comment and plot your regressed polynomial over the your original data as well (not just the extrapolated part) you should confirm this.

You might get better results by fitting a higher order polynomial. Your data have two points of inflection so a 3rd order polynomial will probably work quite well. One danger of extrapolating on higher order polynomial however is that they could have fairly dramatic inflections outside of the domain of your data and produce unexpected and wild results.

One way to mitigate against this is by rather performing a linear regression over the final x data points of your series. These are the points highlighted in yellow in the figure. You can tune x as a parameter such that it covers as much of the approximately linear final portion of your curve as makes sense. The red line I have drawn in will be the result of a linear regression performed on only those data (as opposed to the entire data set)

Another option might be to rather fit a spline curve and extrapolate on that. You can use the interp1 function specifying 'spline' or 'pchip' for that.

However which is the best choice will depend largely on the nature of the problem you are trying to solve.

Only use higher order polynomials if it really makes any sense to do that, when the underlying data is considered. — Bernhard, Feb 22 '16 at 09:05
Could you elaborate on the spline curve method you're talking about? — SDG, Feb 22 '16 at 09:05
@SharanDuggirala the docs for `interp1` should explain all you need to know. Splines are an interpolation method that fit natural looking curves *through* all your data points (i.e. the curve will pass directly through every point of your data) and so it is a different concept from regression. However, it also allows you to extrapolate so it might be an option for you... — Dan, Feb 22 '16 at 09:10
I totally agree with this answer. The last paragraph is really important IMO. Doing a *blind* extrapolation without knowing the properties of the underlying data does not really make a lot of sense. BTW: Note that `'cubic'` will perform cubic convolution in future versions according to the current docs. Now it's just an alias for `pchip` which does extrapolation by default. — Matt, Feb 22 '16 at 09:29

Why do I get disjoint data when I try extrapolate after using polynomial regression

1 Answers1