0

I used padding (from padr package) on a data frame to fill the time gap. Now, to fill the gap values for a specified set of columns, I am using fill_by_function. In general, fill_by_function takes the unquoted column names as arguments. However, in my case, I have been provided with a list of column names.

My question is, how will I be able to pass the column list within fill_by_function function. Please note that the list of columns is not pre-defined, so I cannot hardcode the column names inside the fill_by_function.

Below is an example that I tried, but got an error.

x <- seq(as.Date('2016-01-01'), by = 'day', length.out = 366)
x <- x[sample(1:366, 200)] %>% sort
x.df <- data.frame(x  = x,
               y1 = runif(200, 10, 20) %>% round,
               y2 = runif(200, 1, 50) %>% round,
               y3 = runif(200, 20, 40) %>% round)

c.list <- c("y1","y2")
x.df %>% pad %>% fill_by_function(as.name(c.list),fun=mean)

Following is the error message that I got

Error in inds[i] <- which(colnames_x == as.character(cols[[i]])) : replacement has length zero

Is there any other alternative function that I can use

www
  • 38,575
  • 12
  • 48
  • 84
user2129946
  • 59
  • 1
  • 4

1 Answers1

0

This worked for me:

x.df %>% pad %>% fill_by_function(.cols=c.list,fun=mean) %>% tail(.)

             x     y1    y2    y3
361 2016-12-26 14.725 24.31 30.09
362 2016-12-27 14.000 28.00 21.00
363 2016-12-28 14.725 24.31 30.09
364 2016-12-29 15.000 47.00 22.00
365 2016-12-30 14.000 43.00 34.00
366 2016-12-31 17.000 14.00 21.00

Compare to:

x.df %>% pad %>% fill_by_function(y1,fun=mean) %>% tail(.)

             x     y1 y2 y3
361 2016-12-26 14.725 NA NA
362 2016-12-27 14.000 28 21
363 2016-12-28 14.725 NA NA
364 2016-12-29 15.000 47 22
365 2016-12-30 14.000 43 34
366 2016-12-31 17.000 14 21

Check that the output is actually what you want.

CPak
  • 13,260
  • 3
  • 30
  • 48
  • This actually is giving me `Error in cols[[i]] : subscript out of bounds` error. Also, It seems from your output that, the solution has applied the `mean` function to all your columns – user2129946 Jun 18 '17 at 20:37
  • I don't get an error obviously. Try updating your packages `padr` and `dplyr`??? (not sure) – CPak Jun 18 '17 at 20:40
  • Regarding your 2nd comment, your post did not address what you were expecting...I simply passed multiple column names as you requested – CPak Jun 18 '17 at 20:46
  • Thanks Chi. I understand it should work. But, I am still getting the error `Error in cols[[i]] : subscript out of bounds`; even after updating the package – user2129946 Jun 18 '17 at 20:53
  • My only guess is that `%>%` is having problems passing updated data. That is, `pad` increases `nrow` compared to the original data.frame. Try running each command separately, without pipes, to see if that's your problem... – CPak Jun 18 '17 at 21:01