R parameterization of constants in time series

Question

I have a fairly simple equation, in which I have direct measurements of the variables through time, and two different unknown parameters I need to solve for, but which I know can be considered constants over the time periods I'm studying.

Both of these "constants" have fairly narrow ranges of variability in nature. In principle, it seems like some kind of optimization procedure/function should be able to do this easily, by finding the pair of values that minimizes the standard deviation of each of the constant values across the time series.

However, I am new to optimization and parameter fitting. Any help figuring out how to use r code to find the pair (or pairs) of values in this situation would be greatly appreciated.

Below is a simplified form of the equation I'm dealing with:

A * x + B * z - B * d = c + e

A and B are the constants I need to solve for.

Possible real-world values of A are 0.4-0.8

Possible real-world values of B are 0.85-0.99

To create a reasonable mock data set, assuming perfect measurements of all variables, and known values of A and B:

### Generate mock data

### Variables all have a daily cycle and are strongly autocorrelated,
# and so can be approximated via sin function,
# with unique noise added to each to simulate variability:

# Variability for each variable
n <- 1000 # number of data points
t <- seq(0,4*pi,length.out = 1000)
a <- 3
b <- 2
x.unif <- runif(n)
z.norm <- rnorm(n)
c.unif <- runif(n)
d.norm <- rnorm(n)
d.unif <- runif(n)
e.norm <- rnorm(n)
amp <- 1

# Create reasonable values of mock variable data for all variables except e;
# I will calculate from known fixed values for A and B.
x <- a*sin(b*t)+x.unif*amp + 10 # uniform error
z <- a*sin(b*t)+z.norm*amp + 10 # Gaussian/normal error
c <- ((a*sin(b*t)+c.unif*amp) + 10)/4
d <- ((a*sin(b*t)+d.norm*amp)+(a*sin(b*t)+d.unif*amp)+10)/2

# Put vectors in dataframe
dat <- data.frame("t" = t, "x" = x, "z" = z, "c" = c, "d" = d)

# Equation: A*x + B*z - B*d = c + e
# Solve for e:
# e = A*x + B*z - B*d - c

# Specify "true" values for A and B:
A = 0.6
B = 0.9

# Solve for e:
dat <- dat %>%
  mutate(e = A*x + B*z - B*d - c)

# Gather data for easy visualizing of results for e:
dat_gathered <- dat %>%
  gather(-t, value = "value", key = "key")

# Plot all variables
ggplot(dat_gathered, aes(x = t, y = value, color = key)) + geom_line()

# Add small error (to simulate measurement error) to all variables except A and B:

dat <- dat %>%
  mutate(x_j = x + rnorm(x, sd=0.02)/(1/x)) %>%
  mutate(z_j = z + rnorm(z, sd=0.02)/(1/z)) %>%
  mutate(c_j = c + rnorm(c, sd=0.02)/(1/c)) %>%
  mutate(d_j = d + rnorm(d, sd=0.02)/(1/d)) %>%
  mutate(e_j = e + rnorm(e, sd=0.02)/(1/e))

The variables in dat with the _j suffix represent real world data (since they have measurement error added). Knowing the constraint that:

A is within 0.4-0.8

B is within 0.85-0.99

Is it possible to use the noisy "_j" data to optimize for the pair of constant values that minimize deviation of A and B across the entire time series?

score 1 · Accepted Answer · answered Jun 14 '20 at 02:45

1

A little bit of algebra and setting this up as a linear regression problem with no intercept seems to work fine:

m1 <- lm(e_j+c_j ~ 0 + x_j + I(z_j-d_j), data=dat)
coef(m1)  ## A =0.6032, B = 0.8916

It doesn't do anything to constrain the solution, though.

answered Jun 14 '20 at 02:45

Ben Bolker

211,554
25
370
453

A follow-up question. I have another part of the analysis that requires the same approach to simultaneously solve for two constants when all variables are known. However, the equation is an exponential function y = R * h - k - 1.12 * k * e^(-F *u), where e is Euler's constant, the lowercase letters are known variables, and R and F are the constants I need to solve for. Can can R and F be solved for with a regression model with an exponential parameter, and if so, what would the model notation look like written in R code? – photosynthesis Jun 14 '20 at 17:38
it's harder because it's intrinsically non-linear; the `nls()` function would work. Why don't you post it as a new question? (You can comment on/link to your new question in the comments here to bring my attention to it.) – Ben Bolker Jun 14 '20 at 17:52
Question posted, thanks. https://stackoverflow.com/questions/62394583/how-to-use-nls-to-fit-multiple-constants-in-exponential-decay-model – photosynthesis Jun 15 '20 at 18:24

R parameterization of constants in time series

1 Answers1