1

I have a data frame as below with 5000+ rows. I am trying to insert a row where the month is missing e.g. month 6 below - and then utilise linear interpolation to calculate the 'TWS' value. Ideally the Decimal Date would be filled appropriately too but I can sort this afterwards if not! The data frame is months 1:12 for 10 years (2003-2012) but this repeats for multiple grid squares.

I have found lots other similar questions but not relating to a repeating 1:12 monthly sequence.

 > head(ts.data,20)
    GridNo GridIndex  Lon  Lat DecimDate Year Month        TWS
 1    GR72        72 35.5 -4.5  2003.000 2003    01 14.2566781
 2    GR72        72 35.5 -4.5  2003.083 2003    02  5.0413706
 3    GR72        72 35.5 -4.5  2003.167 2003    03  3.8192721
 4    GR72        72 35.5 -4.5  2003.250 2003    04  5.8706026
 5    GR72        72 35.5 -4.5  2003.333 2003    05  7.8461188
 6    GR72        72 35.5 -4.5  2003.500 2003    07  2.3821844
 7    GR72        72 35.5 -4.5  2003.583 2003    08  0.1995629
 8    GR72        72 35.5 -4.5  2003.667 2003    09 -1.8353604
 9    GR72        72 35.5 -4.5  2003.750 2003    10 -2.0410653
 10   GR72        72 35.5 -4.5  2003.833 2003    11 -1.4029813
 11   GR72        72 35.5 -4.5  2003.917 2003    12 -0.2206872
 12   GR72        72 35.5 -4.5  2004.000 2004    01 -0.5090872
 13   GR72        72 35.5 -4.5  2004.083 2004    02 -0.4887118
 14   GR72        72 35.5 -4.5  2004.167 2004    03 -0.7725966
 15   GR72        72 35.5 -4.5  2004.250 2004    04  4.1831581
 16   GR72        72 35.5 -4.5  2004.333 2004    05  2.5651040
 17   GR72        72 35.5 -4.5  2004.417 2004    06 -2.2511409
 18   GR72        72 35.5 -4.5  2004.500 2004    07 -1.6484375
 19   GR72        72 35.5 -4.5  2004.583 2004    08 -4.6508982
 20   GR72        72 35.5 -4.5  2004.667 2004    09 -5.0053745

Any help appreciated!

Jaap
  • 81,064
  • 34
  • 182
  • 193
Darren J
  • 503
  • 2
  • 5
  • 16
  • What about the months after the 9th month of 2004? How many months you are usually missing? How do you want to interpolate if for example you are missing 5 months in end of certain year? – David Arenburg Jul 13 '15 at 12:43
  • Which answers have you found and why didn't they work for your? Is it possible that there are more month than one missing? The task seems rather trivial, what is the exact problem? Inserting new rows, interpolating? – Verena Haunschmid Jul 13 '15 at 12:44
  • @DavidArenburg He is only showing the first couple of rows... What do you mean? – Verena Haunschmid Jul 13 '15 at 12:44
  • Perhaps you could think about appending and not inserting missing months. This doesn't matter for interpolation. If you're doing it in a way that needs a sorted data.frame, you can always sort it after you've appended the missing data. – Roman Luštrik Jul 13 '15 at 12:46
  • @DavidArenburg - the data series continues monthly for years 2003-2012. The 10 years of data then repeats for every 1 deg * 1 deg grid cell between 28.5 to 35.5 longitude and -4.5 to 0.5 latitude. There are multiple single months missing throughout 2003-2012. – Darren J Jul 13 '15 at 12:54
  • @VerenaHaunschmid - There are multiple months missing - the data is observational so no pattern to this. I've got code to interpolate already but am struggling to insert new rows where months are missing. I asked about interpolation in case it was easier to add new rows and interpolate at the same time? – Darren J Jul 13 '15 at 12:58

2 Answers2

4

Using data.table and zoo packages you can easily expand your data set and interpolate as long as you don't have NAs at both sizes of the year

Expend the data set

library(data.table)
library(zoo)
res <- setDT(df)[, .SD[match(1:12, Month)], by = Year]

Interpolate on whatever column you want

cols <- c("Month", "DecimDate", "TWS")
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols]

res
#     Year GridNo GridIndex  Lon  Lat DecimDate Month        TWS
#  1: 2003   GR72        72 35.5 -4.5  2003.000     1 14.2566781
#  2: 2003   GR72        72 35.5 -4.5  2003.083     2  5.0413706
#  3: 2003   GR72        72 35.5 -4.5  2003.167     3  3.8192721
#  4: 2003   GR72        72 35.5 -4.5  2003.250     4  5.8706026
#  5: 2003   GR72        72 35.5 -4.5  2003.333     5  7.8461188
#  6: 2003     NA        NA   NA   NA  2003.417     6  5.1141516
#  7: 2003   GR72        72 35.5 -4.5  2003.500     7  2.3821844
#  8: 2003   GR72        72 35.5 -4.5  2003.583     8  0.1995629
#  9: 2003   GR72        72 35.5 -4.5  2003.667     9 -1.8353604
# 10: 2003   GR72        72 35.5 -4.5  2003.750    10 -2.0410653
# 11: 2003   GR72        72 35.5 -4.5  2003.833    11 -1.4029813
# 12: 2003   GR72        72 35.5 -4.5  2003.917    12 -0.2206872
# 13: 2004   GR72        72 35.5 -4.5  2004.000     1 -0.5090872
# 14: 2004   GR72        72 35.5 -4.5  2004.083     2 -0.4887118
# 15: 2004   GR72        72 35.5 -4.5  2004.167     3 -0.7725966
# 16: 2004   GR72        72 35.5 -4.5  2004.250     4  4.1831581
# 17: 2004   GR72        72 35.5 -4.5  2004.333     5  2.5651040
# 18: 2004   GR72        72 35.5 -4.5  2004.417     6 -2.2511409
# 19: 2004   GR72        72 35.5 -4.5  2004.500     7 -1.6484375
# 20: 2004   GR72        72 35.5 -4.5  2004.583     8 -4.6508982
# 21: 2004   GR72        72 35.5 -4.5  2004.667     9 -5.0053745
# 22: 2004     NA        NA   NA   NA        NA    NA         NA
# 23: 2004     NA        NA   NA   NA        NA    NA         NA
# 24: 2004     NA        NA   NA   NA        NA    NA         NA
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Thank you for your efforts with this. Your output above is precisely what I needed, however I cannot get it to work (likely me making simple errors). I get this output... https://onedrive.live.com/redir?resid=9E74848E574367C6!2577&authkey=!ABcbTQiXEaRlptE&ithint=file%2cxlsx ....Is there a way to make this work for the original file...https://onedrive.live.com/redir?resid=9E74848E574367C6!2576&authkey=!AHgu-JyqFDATExU&ithint=file%2ccsv thanks for your patience! – Darren J Jul 13 '15 at 13:43
0

I would simply first transform your dates into actual Dates (here taking the first of every month:

dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))

Do the same for the target, missing months (here just one but can work with many):

target <- as.Date("2003-06-01")

And do the approximation:

approx(dates, ts.data$TWS, target)
$x
[1] "2003-06-01"

$y
[1] 5.069365

So in the context of your dataframe (here simplified):

ts.data <- data.frame(Year=c(rep(2003,11),rep(2004,9)),Month=c((1:12)[-6],1:9),TWS=c(14.2566781,5.0413706,3.8192721,5.8706026,7.8461188, 2.3821844, 0.1995629,-1.8353604,-2.0410653,-1.4029813,-0.2206872,-0.5090872,-0.4887118,-0.7725966, 4.1831581, 2.5651040,-2.2511409,-1.6484375,-4.6508982, -5.0053745))
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
target <- as.Date("2003-06-01")
ts.data <- rbind(ts.data, 
                 data.frame(Year=2003, 
                            Month=6, 
                            TWS=approx(dates, ts.data$TWS, target)$y)
ts.data <- ts.data[order(ts.data$Year, ts.data$Month),]
plannapus
  • 18,529
  • 4
  • 72
  • 94