I am trying to perform a very simple data completion : I have two columns of the same measurement made in two different locations a few miles away from each other. Location 1 is more complete than location 2, and I want to complete the second with the first, by applying a coefficient (loc1/loc2) to 1.
My problem is that it is about weather values, that vary with the day considered, so applying the same mean coefficient to all my values would not be ideal. I first got mean coefficients by the values of loc1 and the day, but for the days with no loc2 values, I would like to :
- replace NA in 'coef' value by a coef calculated with a same loc1 value
- but there are sometime more than one coef for each loc1 value, so when there is more than one, it would be preferentially the one measured on the same day
and when there is no coef value at all for one loc1 value, then coef would be the same than for that loc1 value +/- an interval (the coef of the closest loc1 value on the closest date to the one to be completed) Here is what it looks like :
loc1 Date Coef loc2 1 12 204 3 4 2 8 147 4 2 3 8 204 NA NA 4 10 147 NA NA 5 10 158 NA NA 6 6 159 3 2 7 6 162 NA NA 8 6 170 2 3 9 .3 175 0.4833333 0.145 10 0.3 204 NA NA 11 0.4 146 NA NA 12 0.4 147 NA NA
I am pretty sure there is a non messy way of doing that, but with my limited knowledge using ifelse or ddply, I didn't get any closer to what I want in the end. I feel a loop could do the trick, but I have no idea how..
Any thoughts would be greatly appreciated ! many thanks in advance !