0

I have a dataframe column with missing values.

I'd like to make another column which creates a variable where the NAs are by subtracting the element immediately before the NA sequence from the element immediately after the NA sequence, dividing by the number of NAs + 1 and adding that first to the element just before the NA sequence, then to the prior replacement NA value.

For example, say I have a column like this:

df <- data.frame(Col1 = c(2, 3, 5, NA, NA, 9, 10, NA, 16, 19))

   Col1
1     2
2     3
3     5
4    NA
5    NA
6     9
7    10
8    NA
9    16
10   19

I want to mutate another column which will look like this:

   Col1  Col2
1     2    NA
2     3    NA
3     5    NA
4    NA  6.33
5    NA  7.66
6     9    NA
7    10    NA
8    NA 13.00
9    16    NA
10   19    NA

I'd also be interested to know how I could just fill in the NA values and keep it to one column.

I'm attempting this using dplyr, and I know that the mutate, lag and lead functions are probably needed here, however I'm struggling to apply lag and lean to a sequence of NAs rather than just a single element.

Sean
  • 99
  • 6

1 Answers1

4

a data.table approach, but can (most likely) easily be adapted to the tidyverse.

logic used: Create a complete Col2-column, using zoo's na.approx-function. Then replace the Col2 values with NA on the rows where Col1 is not NA.

library(zoo)
library(data.table)
setDT(df)[, Col2 := zoo::na.approx(Col1)][!is.na(Col1), Col2 := NA][]

#    Col1      Col2
# 1:    2        NA
# 2:    3        NA
# 3:    5        NA
# 4:   NA  6.333333
# 5:   NA  7.666667
# 6:    9        NA
# 7:   10        NA
# 8:   NA 13.000000
# 9:   16        NA
#10:   19        NA
Wimpel
  • 26,031
  • 1
  • 20
  • 37