I have a fairly simple equation, in which I have direct measurements of the variables through time, and two different unknown parameters I need to solve for, but which I know can be considered constants over the time periods I'm studying.
Both of these "constants" have fairly narrow ranges of variability in nature. In principle, it seems like some kind of optimization procedure/function should be able to do this easily, by finding the pair of values that minimizes the standard deviation of each of the constant values across the time series.
However, I am new to optimization and parameter fitting. Any help figuring out how to use r code to find the pair (or pairs) of values in this situation would be greatly appreciated.
Below is a simplified form of the equation I'm dealing with:
A * x + B * z - B * d = c + e
A and B are the constants I need to solve for.
Possible real-world values of A are 0.4-0.8
Possible real-world values of B are 0.85-0.99
To create a reasonable mock data set, assuming perfect measurements of all variables, and known values of A and B:
### Generate mock data
### Variables all have a daily cycle and are strongly autocorrelated,
# and so can be approximated via sin function,
# with unique noise added to each to simulate variability:
# Variability for each variable
n <- 1000 # number of data points
t <- seq(0,4*pi,length.out = 1000)
a <- 3
b <- 2
x.unif <- runif(n)
z.norm <- rnorm(n)
c.unif <- runif(n)
d.norm <- rnorm(n)
d.unif <- runif(n)
e.norm <- rnorm(n)
amp <- 1
# Create reasonable values of mock variable data for all variables except e;
# I will calculate from known fixed values for A and B.
x <- a*sin(b*t)+x.unif*amp + 10 # uniform error
z <- a*sin(b*t)+z.norm*amp + 10 # Gaussian/normal error
c <- ((a*sin(b*t)+c.unif*amp) + 10)/4
d <- ((a*sin(b*t)+d.norm*amp)+(a*sin(b*t)+d.unif*amp)+10)/2
# Put vectors in dataframe
dat <- data.frame("t" = t, "x" = x, "z" = z, "c" = c, "d" = d)
# Equation: A*x + B*z - B*d = c + e
# Solve for e:
# e = A*x + B*z - B*d - c
# Specify "true" values for A and B:
A = 0.6
B = 0.9
# Solve for e:
dat <- dat %>%
mutate(e = A*x + B*z - B*d - c)
# Gather data for easy visualizing of results for e:
dat_gathered <- dat %>%
gather(-t, value = "value", key = "key")
# Plot all variables
ggplot(dat_gathered, aes(x = t, y = value, color = key)) + geom_line()
# Add small error (to simulate measurement error) to all variables except A and B:
dat <- dat %>%
mutate(x_j = x + rnorm(x, sd=0.02)/(1/x)) %>%
mutate(z_j = z + rnorm(z, sd=0.02)/(1/z)) %>%
mutate(c_j = c + rnorm(c, sd=0.02)/(1/c)) %>%
mutate(d_j = d + rnorm(d, sd=0.02)/(1/d)) %>%
mutate(e_j = e + rnorm(e, sd=0.02)/(1/e))
The variables in dat with the _j suffix represent real world data (since they have measurement error added). Knowing the constraint that:
A is within 0.4-0.8
B is within 0.85-0.99
Is it possible to use the noisy "_j" data to optimize for the pair of constant values that minimize deviation of A and B across the entire time series?