I have a for loop I would like to run by group. I would like it to run through a set of data, creates a time series for most rows, and then output a forecast for that row of data (based on that time point and the ones preceding it) in the group The issue I am having is running that loop for every 'group' within my data. I want to avoid doing so manually as that would take hours and surely there is a better way.
Allow to me explain in more detail.
I have a large dataset (1.6M rows), each row has a year, country A, country B, and a number of measures which concern the relationship between the two.
So far, I have been successful in extracting a single (country A, country B) relationship into a new table and using a for loop to output the necessary forecast data to a new variable in the dataset. I'd like to create to have that for loop run over every (country A, country B) grouping with more than 3 entries.
The data:
Here I will replicate a small slice of the data, and will include a missing value for realism.
set.seed(2000)
df <- data.frame(year = rep(c(1946:1970),length.out=50),
ccode1 = rep(c("2"), length.out = 50),
ccode2 = rep(c("20","31"), each=25),
kappavv = rnorm(50,mean = 0, sd=0.25),
output = NA)
df$kappavv[12] <- NA
What I've done:
NOTE: I start forecasting from the third data point of each group but based on all time points preceding the forecast.
for(i in 3:nrow(df)){
dat_ts <- ts(df[, 4], start = c(min(df$year), 1), end = c(df$year[i], 1), frequency = 1)
dat_ts_corr <- na_interpolation(dat_ts)
trialseries <- holt(dat_ts_corr, h=1)
df$output[i] <- trialseries$mean
}
This part works and outputs what I want when I apply it to a single pairing of ccode1 and ccode2 when arranged correctly in ascending order of years.
What isn't working:
I am having some serious problems getting my head around applying this for loop by grouping of ccode2. Some of my data is uneven: sometimes groups are different sizes, having different start/end points, and there are missing data.
I have tried expressing the loop as a function, using group_by() and piping, using various types of apply() functions.
Your help is appreciated. Thanks in advance. I am glad to answer any clarifying questions you have.