So, I have the following problem: I have a data set, A (data.table object), of the following structure:
date days rate
1996-01-02 9 5.763067
1996-01-02 15 5.745902
1996-01-02 50 5.673317
1996-01-02 78 5.608884
1996-01-02 169 5.473762
1996-01-03 9 5.763067
1996-01-03 14 5.747397
1996-01-03 49 5.672263
1996-01-03 77 5.603705
1996-01-03 168 5.470584
1996-01-04 11 5.729460
1996-01-04 13 5.726104
1996-01-04 48 5.664931
1996-01-04 76 5.601891
1996-01-04 167 5.468961
Note that the days column and its size may differ for each day. My goal is now to (piecewise linearly) interpolate rate along days. I am doing this for each day via
approx(x=A[,days],y=A[,rate],xout=days_vec,rule=2)
where days_vec <- min_days:max_days
, i.e. the days range I am interested in (say 1:100).
I have two problems here:
approx only interpolates, i.e. it does not create a linear fit across min(x) and max(x). If I am now interested in days 1:100, I first need to do it by hand using days 9 and 15 (first 2 lines of A) via:
first_days <- 1:(A[1,days]-1) #1:8 rate_vec[first_days] <- A[1,rate] + (first_days - A[1,days])/(A[2,days]-A[1,days])*(A[2,rate]-A[1,rate])
and then using the approx line above for rate_vec[9:100]
. Is there a way of doing this in 1 step?
- Right now, given that I need two steps and the shift point between the two procedures (here 9) differs among dates, I cannot see an implementation via data.table, although this would be vastly preferred (using data.table methods to interpolate/extrapolate and then returning the expanded data.table object). Thus, I currently run a for loop through the dates, which is of course much, much slower.
Question: Is the problem above better implementable and also, is this somehow doable with data.table methods instead of looping through A?