According to this answer https://stackoverflow.com/a/50236294/10870191, the formula pandas uses for ewma.std is
mrt_array = np.array(mrt.tolist())
M = len(mrt_array)
weights = (1-a)**np.arange(M-1, -1, -1) # This is reverse order to match Series order
ewma = sum(weights * mrt_array) / sum(weights)
bias = sum(weights)**2 / (sum(weights)**2 - sum(weights**2))
ewmvar = bias * sum(weights * (mrt_array - ewma)**2) / sum(weights)
ewmstd = np.sqrt(ewmvar)
How to derive this formula theoretically? Here is my attempt, but I must be missing something: let x_t be a series of random variables, which we assume for simplicity are independent, indentically distributed, and de-meaned.
Then, ewm_T = [\sum [c ^ (T - t) . x_t]] / [\sum [c ^ (T - t)]]
Since the variables are demeaned: Var(ewm_T) = E[[\sum [c ^ (T - t) . x_t]] ^ 2] / [\sum [c ^ (T - t)]] ^ 2
Since we assumed independence, all the cross-terms are zero, and Var(ewm_T) = [\sum (c ^ (T - t)) ^ 2 . E(x_t ^ 2)] / [\sum [c ^ (T - t)]] ^ 2 = Var(x_t) * [\sum (c ^ (T - t)) ^ 2] / [\sum [c ^ (T - t)]] ^ 2
So, finally, Var(x_t) = ([\sum [c ^ (T - t)]] ^ 2) / [\sum (c ^ (T - t)) ^ 2] * Var(ewm_T)
How does this relate to the formula from Pandas? Is the way I am conceptually thinking about pandas.ewm incorrect? Thanks!