0

Background: I am analyzing oil production data where I plot daily oil rate on the y-axis and a diagnostic "time" factor on the x-axis. This combination tends to exhibit a certain trend depending on the flow regime where there is typically a half slope or quarter slope followed by a unit slope. It is very basic, but the approach is archaic and everything is done manually.

I was wondering if there was a way in R where you can find the segment of the data that best fits a specific slope and fit the associated line over that data maybe up to a R^2 criteria on a log-log plot? Also is there a way to get the point where that slope changes?

example of What the raw data looks like

example of desired end result

kHAN
  • 1
  • Hi kHAN ask the same question on Cross Validated, look into the dput function, also I don't understand the second line but the first looks to be a simple linear regression – Bruno Dec 20 '19 at 15:05
  • Finding a specific point where something changes is often called "changepoint analysis". The `changepoint` package can work for you, and there are other options similar to that. – Gregor Thomas Dec 20 '19 at 15:11

2 Answers2

0

what about using a scatterplot?

scatter.smooth(x=data$x, y=data$y, main="y ~ x")  # scatterplot
Carbo
  • 906
  • 5
  • 23
0

In the future please provide your data in reproducible form so we can work with it. This time I have provided some sample data in the Note at the end.

Let kvalues be the possible indexes of x of the change point. We do not include ones near the ends to avoid numeric problems. Then for each kvalue we perform the regression defined in the regr function and compute the residual sum of squares using deviance. Take the least of thoxe and display that regression. No packages are used.

(If you want to fix the slopes then remove the slope parameters from the formula and starting values and replace them with the fixed values in the formula.)

kvalues <- 5:45
st <- list(a1 = 1, b1 = 1, a2 = 2, b2 = 2)
regr <- function(k) try(nls(y ~ ifelse(x < k, a1 + b1 * x, a2 + b2 * x), start = st))
i <- which.min(sapply(kvalues, function(k) deviance(regr(k))))
k <- kvalues[i]
k; x[k]
## [1] 26
## [1] 26

fm <- regr(k)
fm
## Nonlinear regression model
##   model: y ~ ifelse(x < k, a1 + b1 * x, a2 + b2 * x)
##    data: parent.frame()
##     a1     b1     a2     b2 
##  1.507 -1.042  1.173 -2.002 
##  residual sum-of-squares: 39.52
##
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 2.917e-09

plot(y ~ x)
lines(fitted(fm) ~ x)
abline(v = x[k])

screenshot

Note

set.seed(123)
x <- 1:50
y <- 1 - rep(1:2, each = 25) * x + rnorm(50)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341