1

I have a dataframe with the European states where each state occurs 10 times (for 10 days). I want to interpolate the NA values of multiple columns, which I could achieve using

library("imputeTS")
na_interpolation(dataframe)

But I want to interpolate all NA values by state. How can that be done? I have already tried a lot of different solutions, but none did work for me.

As pseudo-code I would like to have something like

na_interpolation(dataframe, groupby=state)

Anything that could work?

These code samples did unfortunaetly not work for me

interpolation <- dataframe %>% 
  group_by(state-name) %>% 
  na_interpolation(dataframe)
Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
the_chimp
  • 205
  • 4
  • 18
  • 1
    I have not used `na.interpolation`. But the piping operator assumes the first parameter in piped functions is `dataframe`. So perhaps try `na.interpolation()` – SteveM Dec 12 '20 at 12:38
  • thanks for the hint, I am not familiar with the piping operations. It works but unfortunately it ignores the group_by so it does the interpolation also between states and not for each state separately. – the_chimp Dec 12 '20 at 12:50

3 Answers3

1

You should be able to apply na_interpolation by group. Try :

library(dplyr)

interpolation  <- dataframe %>%
                    group_by(state) %>%
                    mutate(value = imputeTS::na_interpolation(value))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

You could use the split-apply-bind method:

do.call(rbind, lapply(split(dataframe, dataframe$state), na_interpolation))

As a worked example, take the following dummy data:

set.seed(3)

dataframe <- data.frame(state = rep(c("A", "B", "C"), each = 5),
                        value = rnorm(15))

dataframe$value[sample(15, 4)] <- NA

dataframe
#>    state       value
#> 1      A -0.96193342
#> 2      A          NA
#> 3      A  0.25878822
#> 4      A -1.15213189
#> 5      A  0.19578283
#> 6      B  0.03012394
#> 7      B  0.08541773
#> 8      B          NA
#> 9      B          NA
#> 10     B  1.26736872
#> 11     C -0.74478160
#> 12     C          NA
#> 13     C -0.71635849
#> 14     C  0.25265237
#> 15     C  0.15204571

Then we can do:

library(imputeTS)

do.call(rbind, lapply(split(dataframe, dataframe$state), na_interpolation))
#>      state       value
#> A.1      A -0.96193342
#> A.2      A -0.35157260
#> A.3      A  0.25878822
#> A.4      A -1.15213189
#> A.5      A  0.19578283
#> B.6      B  0.03012394
#> B.7      B  0.08541773
#> B.8      B  0.47940140
#> B.9      B  0.87338506
#> B.10     B  1.26736872
#> C.11     C -0.74478160
#> C.12     C -0.73057004
#> C.13     C -0.71635849
#> C.14     C  0.25265237
#> C.15     C  0.15204571

Created on 2020-12-12 by the reprex package (v0.3.0)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • thanks that worked!! but I have to set dataframe <- do.call(....). Using only do.call() does not apply the changes to my dataframe. but otherwise, perfect solution. I tried 5 hours to solve it and could not find it out, thanks a lot! – the_chimp Dec 12 '20 at 15:48
0

An option with data.table

library(data.table)
setDT(dataframe)[,  value := imputeTS::na_interpolation(value), state]
akrun
  • 874,273
  • 37
  • 540
  • 662