R Finding the RSI on a subset

Question

I am using the following code to find the RSI (Relative Strength Index) and DEMA (double exponential moving average) of a stock.

library(quantmod)
library(TTR)
getSymbols("AAPL")
chartSeries(AAPL, TA=NULL)
data=AAPL[,4]
AAPL$rsi = TTR::RSI(data)
AAPL$dema = TTR::DEMA(data)

# object B stores the copy of AAPL object and I save it in a CSV file
B = AAPL

Every day, object AAPL will have a new line to reflect data of the last closing day.

Each day RSI and DEMA functions run on the entire dataset. It seems that it is a wastage of CPU power and time to run RSI again and again on the last 12+ years data, even though only one new row (for the last trading day) is added to the data.

Is there a way to find RSI, DEMA, etc... of only the last day in AAPL object and add it to the old dataset B?

I wonder how quant traders might be doing this kind of operation when they get tick data each second and they need to find RSI and few other indicators on new and all the past data. Even with the fastest computer, it will take several minutes to get the indicator data, and the market would have moved by then.

Thanks!

You take longer to download the data than calculating the RSI. If you want to gain time, limit the amount of data you download, you can append only the last row(s) of data to a csv file with `write.table(rows_to_append, "mycvs.csv", append = TRUE, row.names = FALSE, col.names = FALSE)` — phiver, Mar 02 '21 at 10:15
Downloading data is not a problem. I have the latest data without any lag. It's just the indicators that take time. — Saurabh, Mar 02 '21 at 12:53
There are data providers that allow you to get the RSI directly, i.e. it is precomputed. Alternatives are only refreshing the last RSI value, moving the code to a lower-level/compiled language, which will make the computations considerably faster than in R (although TTR is using compiled code, as far as I know and should therefore be quite fast). — tester, Mar 04 '21 at 16:57
There are many charting tools that get the data (OHLC) tick by tick and they update all the indicators almost instantaneously. I am sure they are not calculating indicators each day for the entire dataset. There is something I am missing completely. Also, I don't want to be dependent on data providers for indicators as I won't be able to tweak them. — Saurabh, Mar 04 '21 at 17:17

score 2 · Accepted Answer · answered Mar 04 '21 at 17:28

Let's say that yesterday you downloaded all of the relevant data and calculated all of the RSI and DEMA statistics. Below are the data up until March 2, 2021.

library(quantmod)
library(TTR)
getSymbols("AAPL")
chartSeries(AAPL, TA=NULL)
AAPL <- AAPL[, ]
data=AAPL[,4]
AAPL$rsi = TTR::RSI(data)
AAPL$dema = TTR::DEMA(data)
#            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted      rsi     dema
# 2021-02-23    123.76    126.71   118.39     125.86   158273000        125.86 35.08898 127.7444
# 2021-02-24    124.94    125.56   122.23     125.35   111039900        125.35 34.28019 126.5275
# 2021-02-25    124.68    126.46   120.54     120.99   148199500        120.99 28.27909 124.2326
# 2021-02-26    122.59    124.85   121.20     121.26   164320000        121.26 29.10677 122.6783
# 2021-03-01    123.75    127.93   122.79     127.79   115998300        127.79 45.49055 123.7497
# 2021-03-02    128.41    128.72   125.01     125.12   102015300        125.12 41.28885 123.7178

Then, you save this result to a CSV:

write_csv(as.data.frame(AAPL), "aapl.csv")

Now, today you downloaded the data and you've got one new data point. By using the last 200 days numbers, you could generate the same value for the most recent day as using the whole data set. This seems to work for other symbols, too, but you'd want to make sure it generalizes.

getSymbols("AAPL")
data=AAPL[(nrow(AAPL)-200):nrow(AAPL),4]
AAPL$rsi = TTR::RSI(data)
AAPL$dema = TTR::DEMA(data)
tail(AAPL)
#            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted      rsi     dema
# 2021-02-24    124.94    125.56   122.23     125.35   111039900        125.35 34.28019 126.5275
# 2021-02-25    124.68    126.46   120.54     120.99   148199500        120.99 28.27909 124.2326
# 2021-02-26    122.59    124.85   121.20     121.26   164320000        121.26 29.10677 122.6783
# 2021-03-01    123.75    127.93   122.79     127.79   115998300        127.79 45.49055 123.7497
# 2021-03-02    128.41    128.72   125.01     125.12   102015300        125.12 41.28885 123.7178
# 2021-03-03    124.81    125.71   121.84     122.06   112430400        122.06 37.06365 122.7313

You could then take this last row and append it to the previous CSV as @phiver suggested:

write_csv(as.data.frame(AAPL)[nrow(AAPL), ], "aapl.csv", append=TRUE)

The real question is what's to be gained from such a procedure? Looking at the benchmarks for the two different procedures, using the median estimates, executing the RSI operation on the full data is almost 40% slower, though it will not be noticeable if you're doing only a few calls. I didn't print the results here, but the DEMA routine is about 30% slower on the full data set. If you had to do this thousands of times per day, doing it like this might make sense, but if you had to do it 10 times per day, it may not be worth the trouble.

library(microbenchmark)
microbenchmark(TTR::RSI(AAPL[,4]), times=1000)
# Unit: microseconds
#                       expr    min      lq     mean   median      uq      max neval
# TTR::RSI(AAPL[, 4]) 797.03 823.431 1008.936 852.5145 924.193 18113.29  1000
microbenchmark(TTR::RSI(AAPL[(nrow(AAPL)-200):nrow(AAPL),4]), times=1000)
# Unit: microseconds
#                                             expr     min      lq     mean median      uq      max neval
# TTR::RSI(AAPL[(nrow(AAPL) - 200):nrow(AAPL), 4]) 634.306 652.424 710.9095 671.79 706.294 11743.02  1000

Thanks, Dave. I am actually doing it several thousand times each day and speed is really important. RSI lookback period is only 14 days by default. If I subset last 14 days and run RSI over those days, the resulting value is different than the one I get after running it on the entire set. Just wondering, why did you decided on 200 days lookback and not 100 or maybe 50? — Saurabh, Mar 04 '21 at 17:46
100 wasn’t long enough to produce the same answer as done on the full dataset. It was around 200 that the results converged. — DaveArmstrong, Mar 04 '21 at 17:47
Got it. As RSI stabilizes in 200 days, there are indicators using EMA, DEMA, etc... which take 500 days or more to stabilize. There must be a way to escape this repeated calculation. — Saurabh, Mar 04 '21 at 17:51
It's a good question for someone who knows more about these things than I do. Good luck! — DaveArmstrong, Mar 04 '21 at 17:58

R Finding the RSI on a subset

1 Answers1