I decided to revise my question to make it more clear [25.03.2021]:
I have a dataset of measured water levels over one year for 12 measuring stations along a tidal river. Hereby each measuring station has a river kilometre value that works as a variable representing space. The water level shows a periodic/sinusoidal pattern over time and slightly also over space. And now I need to model water levels over time and space.
Since this dataset is too big and since I have no permission to share, I simulated some data according to a wave function ψ(x,t) = Ao sin[ωt - kx + φo]. The real data is a bit more complicated but both methods I tried (nlsLM and GAM) do not work for both real and simulated data. Therefore, simulated data is enough to demonstrate my problem.
nlsLM does only work then I nearly perfectly predefine all model parameters, if they deviate only slightly this method completely fails as can be seen here.
GAM works pretty good for time and space components separately but not then combined as can be seen here.
Maybe somebody knows what I did wrong with these methods or are other methods are better suited?
###### Simulate data (similar to original dataset but less complicated) ######
### Create Long-Table
stations <- seq(0,350,30)
start_time <- "2020-01-01 00:00:00"
end_time <- "2020-01-31 00:00:00"
time_interval <- "30 mins"
time_vector <- seq.POSIXt(from = as.POSIXct(start_time),
to = as.POSIXct(end_time),
by = time_interval)
df <- data.frame(time = rep(time_vector, times = length(stations)),
place = rep(stations, each=length(time_vector)),
timediff = as.numeric(difftime(rep(time_vector, times = length(stations)),
as.POSIXct("2020-01-01 00:00:00"),
units = "mins")))
### Parameter according to a wave function
A0 <- 200
k0 <- 0.023
w0 <- 0.005
phi0 <- 10
### Simulate water level values
df$level <- A0*sin(k0*df$place + w0*df$timediff + phi0)
### Plot simulated data
par(mfrow=c(1,2))
plot(level~timediff, data = df[df$place==30,], type = "l",
main="Water level over time")
plot(level~place, data = df[df$timediff==30,], type = "l",
main="Water level over space")
par(mfrow=c(1,1))
###### Try to estimate model function parameters using nlsLM ######
### Modelling using nlsLM
library(minpack.lm)
nlsmod <- nlsLM(level ~ A*sin(k*df$place + w*df$timediff + phi), data = df,
start=c(A = 200, k = 0.023, w = 0.001, phi = 10),
lower=c(A = 100, k = 0.001, w = 0.005, phi = 0),
upper=c(A = 1000, k = 0.01, w = 0.1, phi = 1000),
control=nls.lm.control(maxiter=1000))
# defining an area to search for parameters did not worked despite the real values are included
nlsmod <- nlsLM(level ~ A*sin(k*df$place + w*df$timediff + phi), data = df,
start=c(A = 200, k = 0.023, w = 0.005, phi = 10),
control=nls.lm.control(maxiter=1000))
# defining exactly the real model values worked but this makes no sense since I would like
# to estimate them
nlsmod <- nlsLM(level ~ A*sin(k*df$place + w*df$timediff + phi), data = df,
start=c(A = 150, k = 0.023, w = 0.001, phi = 10),
control=nls.lm.control(maxiter=1000))
# then changing just two values slightly the nlsLM (and nls) function does not work anymore
summary(nlsmod)
nlsmod
### Create new dataset
df.new = data.frame(timediff = df$timediff, place = df$place)
df.new$pred <- predict(nlsmod, df.new)
### Plot simulated and predicted data
par(mfrow=c(1,2))
plot(level~timediff, data = df[df$place==30,], type = "l")
lines(pred~timediff, data = df.new[df.new$place==30,], type = "l", col = "red")
plot(level~place, data = df[df$timediff==0,], type = "l")
lines(pred~place, data = df.new[df.new$timediff==0,], type = "l", col = "red")
par(mfrow=c(1,1))
###### Modeling using GAM ######
### Create one time and one space dataset for testing fit seperately
df.time <- df[df$place==30,]
df.place <- df[df$timediff==0,]
### Load package
library(mgcv)
### Test modeling space
plot(level ~ place, data = df.place, type = "p")
bam_mod <- bam(level ~ s(place), data = df.place)
plot(bam_mod)
df.new = data.frame(place = df.place$place)
df.new$pred <- predict(bam_mod, df.new)
plot(level~place, data = df.place, type = "p")
lines(pred~place, data = df.new, type = "p", col = "red")
# works well
### Test modeling time
plot(level ~ timediff, data = df.time, type = "l")
bam_mod <- bam(level ~ s(timediff, k=400, bs="cc"), data = df.time, discrete=TRUE, nthreads=10)
plot(bam_mod)
df.new = data.frame(timediff = df.time$timediff)
df.new$pred <- predict(bam_mod, df.new)
plot(level~timediff, data = df.time, type = "l")
lines(pred~timediff, data = df.new, type = "l", col = "red")
# works well
### Test modeling place and time
par(mfrow=c(1,2))
plot(level~timediff, data = df[df$place==30,], type = "l")#, xlim=c(0,5000))
plot(level~place, data = df[df$timediff==0,], type = "l")
par(mfrow=c(1,1))
bam_mod <- bam(level ~ s(place, k = 7, bs="cc") +
s(timediff, k=400, bs="cc") +
s(timediff, place, k=400) +
place +
timediff,
data = df)#, discrete=TRUE, nthreads=10)
# takes a while but did not work
bam_mod <- bam(level ~ s(place, k = 7, bs="cc") + s(timediff, k=400, bs="cc"), data = df, discrete=TRUE, nthreads=10)
# faster but also did not work
plot(bam_mod)
### Create new dataset
df.new = data.frame(timediff = df$timediff, place = df$place)
df.new$pred <- predict(bam_mod, df.new)
### Plot simulated and predicted data
par(mfrow=c(1,2))
plot(level~timediff, data = df[df$place==30,], type = "l")#, xlim=c(0,5000))
lines(pred~timediff, data = df.new[df.new$place==30,], type = "l", col = "red")
plot(level~place, data = df[df$timediff==0,], type = "l")
lines(pred~place, data = df.new[df.new$timediff==0,], type = "l", col = "red")
par(mfrow=c(1,1))