I know how to do a basic polynomial regression in R. However, I can only use nls
or lm
to fit a line that minimizes error with the points.
This works most of the time, but sometimes when there are measurement gaps in the data, the model becomes very counter-intuitive. Is there a way to add extra constraints?
Reproducible Example:
I want to fit a model to the following made up data (similar to my real data):
x <- c(0, 6, 21, 41, 49, 63, 166)
y <- c(3.3, 4.2, 4.4, 3.6, 4.1, 6.7, 9.8)
df <- data.frame(x, y)
First, let's plot it.
library(ggplot2)
points <- ggplot(df, aes(x,y)) + geom_point(size=4, col='red')
points
It looks like if we connected these points with a line, it would change direction 3 times, so let's try fitting a quartic to it.
lm <- lm(formula = y ~ x + I(x^2) + I(x^3) + I(x^4))
quartic <- function(x) lm$coefficients[5]*x^4 + lm$coefficients[4]*x^3 + lm$coefficients[3]*x^2 + lm$coefficients[2]*x + lm$coefficients[1]
points + stat_function(fun=quartic)
Looks like the model fits the points pretty well... except, because our data had a large gap between 63 and 166, there is a huge spike there which has no reason to be in the model. (For my actual data I know that there is no huge peak there)
So the question in this case is:
- How can I set that local maximum to be on (166, 9.8)?
If that's not possible, then another way to do it would be:
- How can I limit the y-values predicted by the line from becoming larger than y=9.8.
Or perhaps there's a better model to be using? (Other than doing it piece-wise). My purpose is to compare features of models between graphs.