1

I have the data.table as follows

dt <- structure(list(x = c(-0.888888888888886, -0.588235294117648, 
0.630952380952381, 0.0769230769230788, 0.250000000000003, -0.615384615384616, 
0.888888888888891, 0.924528301886792, -0.477326968973745, 0), 
    ema = c(-0.121833534531943, -0.148485063651126, -0.103945781102354, 
    -0.0936104177866151, -0.0739755367702369, -0.104913198405344, 
    -0.0481245077028166, NA, NA, 
    NA)), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"))

which looks like

             x         ema
 1: -0.88888889 -0.121833535
 2: -0.58823529 -0.148485064
 3:  0.63095238 -0.103945781
 4:  0.07692308 -0.093610418
 5:  0.25000000 -0.073975537
 6: -0.61538462 -0.104913198
 7:  0.88888889 -0.048124508
 8:  0.9245283          NA
 9: -0.4773270          NA
10:  0.0000000          NA

In this data.table, column x is a continuous variable that is updated daily and column ema is the EMA (exponentially moving average) of column x. For some reason, I could not update the EMA of x (in column ema), for the past 3 days and now I need to update it using the function ema_add given below -

ema_add <- function(newx, lasty){
   ratio <- 2 / (34+1)
   lasty * (1 - ratio) + ratio * newx
 }

As suggested in the post - Rolling a function on two columns in data.table , I am using the following code to find the EMA of the last three values but it is not giving the desired result. Following is the result I am getting.

dt$updated_ema = Reduce(ema_add, x = dt$x[-1], init = first(dt$ema), accumulate = T)
dt$updated_ema
[1] -0.12183353 -0.43276804  0.27637891  0.14340835  0.21446945 -0.33876659  0.47967039  0.77624233 -0.05947054 -0.01982351

The expected result is -

             x         ema
 1: -0.88888889 -0.121833535
 2: -0.58823529 -0.148485064
 3:  0.63095238 -0.103945781
 4:  0.07692308 -0.093610418
 5:  0.25000000 -0.073975537
 6: -0.61538462 -0.104913198
 7:  0.88888889 -0.048124508
 8:  0.92452830  0.007455653
 9: -0.47732697 -0.020246211
10:  0.00000000 -0.019089285

Can someone spot what I am doing wrong while applying the Reduce function above?

Thanks in advance.

Saurabh
  • 1,566
  • 10
  • 23

1 Answers1

2

I can't really replicate your values, but the package pracma includes a moving average function and it should be possible to just use:

library(pracma) 
dt[,.(x
  , ema
  , ema_test =  movavg(x, n = 3, type="e")
  )]

The code for the moving average function for the exponential function is:

movavg
function (x, n, type = c("s", "t", "w", "m", "e", "r")) 
{
stopifnot(is.numeric(x), is.numeric(n), is.character(type))
if (length(n) != 1 || ceiling(n != floor(n)) || n <= 1) 
    stop("Window length 'n' must be a single integer greater 1.")
nx <- length(x)
if (n >= nx) 
    stop("Window length 'n' must be greater then length of time series.")
y <- numeric(nx)
if (type == "s") {
    for (k in 1:(n - 1)) y[k] <- mean(x[1:k])
    for (k in n:nx) y[k] <- mean(x[(k - n + 1):k])
}
else if (type == "t") {
    n <- ceiling((n + 1)/2)
    s <- movavg(x, n, "s")
    y <- movavg(s, n, "s")
}
else if (type == "w") {
    for (k in 1:(n - 1)) y[k] <- 2 * sum((k:1) * x[k:1])/(k * 
        (k + 1))
    for (k in n:nx) y[k] <- 2 * sum((n:1) * x[k:(k - n + 
        1)])/(n * (n + 1))
}
else if (type == "m") {
    y[1] <- x[1]
    for (k in 2:nx) y[k] <- y[k - 1] + (x[k] - y[k - 1])/n
}
else if (type == "e") {
    a <- 2/(n + 1)
    y[1] <- x[1]
    for (k in 2:nx) y[k] <- a * x[k] + (1 - a) * y[k - 1]
}
else if (type == "r") {
    a <- 1/n
    y[1] <- x[1]
    for (k in 2:nx) y[k] <- a * x[k] + (1 - a) * y[k - 1]
}
else stop("The type must be one of 's', 't', 'w', 'm', 'e', or 'r'.")
return(y)
}
hannes101
  • 2,410
  • 1
  • 17
  • 40
  • Thanks for the pointer! I was not aware of this library. The problem I am trying to solve is - The time-series I have has thousands of values and each day when I run these time series, it takes hours to execute. I just need to find the EMA of new values added daily to these timeseries, therefore I am using the custom function. – Saurabh Jan 16 '21 at 18:56
  • Well, I think the reason is that your calculation is fundamentally different than the cumulated sum of the answer, where you got the reduce function from. Perhaps check out the roll functions of data.table. – hannes101 Jan 16 '21 at 19:35