2

I have been using TA-Lib to calculate EMAs but each time I add a new number to the array, TA-Lib performs the calculation from scratch again.

I am doing analysis on a rather large set of data (> 1M rows) and this is quite slow.

What would be the fastest way to calculate the new EMA when a new value is added?

Thomas
  • 10,933
  • 14
  • 65
  • 136

2 Answers2

1

Let x be the vector of length n containing your samples, that is x[0], ..., x[n-1]. Let y be the vector containing the EMA. Then y is given by the equation:

y[k] = y[k-1] * a + x[k] * (1-a)

Where a is the EMA parameter, which is between 0 and 1. The closer to 1, the smoother the curve.

Therefore, to compute the EMA you just need:

a = 0.9
y[0] = x[0]
for k in range(1, n):
    y[k] = y[k-1]*a + x[k]*(1-a)

Then if you get another sample, that is x[n], you can compute the EMA y[n] without doing the full calculation with:

y[n] = y[n-1]*a + x[n]*(1-a)

This is pseudocode, so if you use lists, it should be something like this:

y.append(y[-1]*a + x[-1]*(1-a))

Edit:

If you really want to improve speed for the computation of the EMA (the whole EMA at a time), you can use numba and numpy:

import numpy as np
from numba import njit
from timeit import timeit

n=1000000
x_np = np.random.randn(n) # your data
x_list = list(x_np)
a = 0.9

def ema_list(x, a):
     y = [x[0]]
     for k in range(1, n):
          y.append(y[-1]*a + x[k]*(1-a))
     return y


@njit("float64[:](float64[:], float64)")
def ema_np(x, a):
     y = np.empty_like(x)
     y[0] = x[0]
     for k in range(1, n):
          y[k] = y[k-1]*a + x[k]*(1-a)
     return y


print(timeit(lambda: ema_list(x_list, a), number=1)) # 0.7080 seconds
print(timeit(lambda: ema_np(x_np, a), number=1)) # 0.008015 seconds

The list implementation takes about 708 ms whereas the numba and numpy takes 8 ms, that is about 88 times faster. The numpy implementation without numba takes similar time to the list implementation.

Diego Palacios
  • 1,096
  • 6
  • 22
  • Maybe using ```map``` instead of the ```for``` loop could improve speed ? – programandoconro Dec 12 '19 at 15:07
  • I don't see how to use `map`, because the value of each element to be computed depends on the previous element. If you want speed, you can use `numba`, I have updated the answer. – Diego Palacios Dec 12 '19 at 15:58
  • Great, I was thinking of some way of applying concurrency and avoid the ```for``` loop. But maybe the ```loop``` is inevitable for the calculation as you said. – programandoconro Dec 12 '19 at 17:12
  • `y[k] = y[k-1] + (1-a) * (x[k] - y[k-1])` may give better precision – Tiana Jan 27 '22 at 14:12
0

There is a fork of TA-Lib named TA-Lib RT that can be used for data that coming in a real time without recalculation of whole dataset.

https://github.com/trufanov-nok/ta-lib-rt

It contains additional variants of TA functions that are designed to work in a loops and process single data value per call instead of array. These functions accept state arguments to keep internal state between calls.

Unfortunately it lacks python bindings yet. I'm currently trying to make such binding by forking original TA-Lib python bindings.

truf
  • 2,843
  • 26
  • 39