8

I often run into the same issue of how to handle NA values when modelling quantitative trading models. The example below is about a stock with EOD data since 1997-01-01 stored in a xts object with four columns named "High","Low","Close","Volume". The data is from Bloomberg. When I want to calculate rolling 20-day volume the error message occurs:

SMA(stock$Volume, 20)
Error in runSum(x, n) : Series contains non-leading NAs  

I quickly located the problem (which I knew was NA values since I have tried this a 1000 times) and found the two days where volume data is missing. I have reproduced those days' data below. As a quick observation the SMA, EMA etc. functions in TTR cannot handle NAs if they are preceded by numbers and followed by numbers.

stock <- as.xts(matrix(c(94.46,92.377,94.204,NA,71.501,70.457,70.979,NA), 2, 4,
  byrow = TRUE, dimnames = list(NULL, c("High","Low","Close","Volume"))),
  as.Date(c("1998-07-07", "1999-02-22")))

What is the best way to handle this issue? Is it to store the stock$Volume as a temporary object where NA values are removed and then calculate the rolling volume and the merge it back in with merge.xts while adding the fill = NA so NA values are inserted again? But is that correct since you take the last 20 trading days and not just the 19 available in the 20-day window?

It is my hope that some sort of "best practice" can be the outcome of this post as I assume this issue also happens for other R-users in finance whether they get their data from Bloomberg, Yahoo Finance or another source.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
P. Garnry
  • 344
  • 1
  • 6
  • 12
  • 1
    This is more of a "how to deal with missing data" question than a programming question. TTR doesn't do this automatically because there is no way to know what type of imputation methodology is appropriate for any given use case. – Joshua Ulrich Oct 15 '12 at 20:49
  • 1
    You are right Joshua. It is not per se a programming question but it is borderline in the sense that you have to code a script that best handles the issues of missing data which happens all the time in financial time series. If missing data is not handled correctly your backtesting results can be misleading. Is there a better forum available for this type of question? – P. Garnry Oct 16 '12 at 05:43
  • I'd like to hear how others handle this, too. I had a similar question, see http://stackoverflow.com/questions/11897169/change-nas-to-interpolated-flat-bars (no answer marked as correct yet, as I believe there must be a better solution.) – Darren Cook Oct 16 '12 at 07:56

3 Answers3

3

Take your initial time series containing NAs, for example a.ts approximate the NAs by using a na.approx a generic functions for replacing each NA with interpolated values (more details in the zoo package document)

b.ts=na.approx(a.ts)

b.ts is the time

feedMe
  • 3,431
  • 2
  • 36
  • 61
Nord Farsi
  • 31
  • 2
2

I don't know about "best practice" but one alternative might be what are called "inhomogeneous time series operators", as presented in Operators on Inhomogeneous Time Series.

This type of question is a good fit for the Quantitative Finance stack exchange site (e.g. see How to update an exponential moving average with missing values?).

Community
  • 1
  • 1
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
0

Try na.omit.

I had the same problem, and this fixed it for me.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Boon Hong
  • 27
  • 2