0

I'm trying to perform linear extrapolation and can't figure out the correct notation to apply the approxExtrap function from the Hmisc package. I have seen some examples, but none that I've been able to apply. I have been able to use the normal approx function specifying only the "x" argument, being the variable I want to interpolate. Doing the following leaves me with an error. Any suggestions? Thank you!

library(tidyverse)
library(zoo)
library(Hmisc)

#write data frame
df <- tibble(day=1:10,
             sales =  c(NA, NA, NA, 4, 5, 6, 7, 8 , 9, 10))

#attempt to perform extrapolation
df <- df %>% 
  mutate(sales=approxExtrap(x=sales, y=NULL)) 

Error in `mutate()`:
! Problem while computing `sales = approxExtrap(x = sales, y = NULL)`.
Caused by error in `approx()`:
! need at least two non-NA values to interpolate
dd_data
  • 93
  • 5

2 Answers2

0

Your x should be the day column and y should be the column you want to extrapolate on. By saying xout=c(1:10) you specify the area you want to extrapolate on. Also the output of approxExtrap is a list of the output from x and y, so you have to keep that in mind. You can use the following code:

library(tidyverse)
library(zoo)
library(Hmisc)

#write data frame
df <- tibble(day=1:10,
             sales =  c(NA, NA, NA, 4, 5, 6, 7, 8 , 9, 10))

#attempt to perform extrapolation
df %>% 
  mutate(sales=approxExtrap(x=day, y=sales, xout = c(1:10), method = "linear")$y) 
#> # A tibble: 10 × 2
#>      day sales
#>    <int> <dbl>
#>  1     1     4
#>  2     2     4
#>  3     3     4
#>  4     4     4
#>  5     5     5
#>  6     6     6
#>  7     7     7
#>  8     8     8
#>  9     9     9
#> 10    10    10

Created on 2022-07-31 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Thank you for your answer. What does adding $y do? The output does not seem accurate, since days 1, 2, and 3 are assigned a sales value of 4. – dd_data Aug 01 '22 at 16:33
0

The reply by Quinten was almost there. It was missing the identification where the data has NA values. This should work:

 #write data frame
df <- tibble(day=1:10,
             sales =  c(NA, NA, NA, 40, 50, 60, 70, 80 , 90, 100))

#attempt to perform extrapolation
df %>% 
  mutate(sales=approxExtrap(x=day[!is.na(sales)], y=sales[!is.na(sales)], xout = c(1:10), method = "linear")$y)

# A tibble: 10 × 2
     day sales
   <int> <dbl>
 1     1    10
 2     2    20
 3     3    30
 4     4    40
 5     5    50
 6     6    60
 7     7    70
 8     8    80
 9     9    90
10    10   100

Adding $y returns the wanted part of the list. See also: R linear extrapolate missing values

MattiKummu
  • 21
  • 2