0

I would like to develop a trend following strategy via back-testing a universe of stocks; lets just say all NYSE or S&P500 equities. I am asking this question today because I am unsure how to handle the storage/organization of the massive amounts of historical price data.

After multiple hours of research I am here, asking for your experience and awareness. I would be extremely grateful for any information/awareness you can share on this topic


Personal Experience background:

-I know how to code. Was a Electrical Engineering major, not a CS major.

-I know how to pull in stock data for individual tickers into excel. Familiar with using filtering and custom studies on ThinkOrSwim.

Applied Context: From 1995 to today lets evaluate the best performing equities on a relative strength/momentum basis. We will look to compare many technical characteristics to develop a strategy. The key to this is having data for a universe of stocks that we can run backtests on using python, C#, R, or any other coding language. We can then determine possible strategies by assesing the returns, the omega ratio, median excess returns, and Jensen's alpha (measured weekly) of entries and exits that are technical driven.


Here's where I am having trouble figuring out what the next step is:

-Loading data for all S&P500 companies into a single excel workbook is just not gonna work. Its too much data for excel to handle I feel like. Each ticker is going to have multiple MB of price data.

-What is the best way to get and then store the price data for each ticker in the universe? Are we looking at something like SQL or Microsoft access here? I dont know; I dont have enough awareness on the subject of handling lots of data like this. What are you thoughts?


I have used ToS to filter stocks based off of true/false parameters over a period of time in the past; however the capabilities of ToS are limited. I would like a more flexible backtesting engine like code written in python or C#. Not sure if Rscript is of any use. - Maybe, there are libraries out there that I do not have awareness of that would make this all possible? If there are let me know.

I am aware that Quantopia and other web based Quant platforms are around. Are these my best bets for backtesting? Any thoughts on them?


Am I making this too complicated? Backtesting a strategy on a single equity or several equities isnt a problem in excel, ToS, or even Tradingview. But with lots of data Im not sure what the best option is for storing that data and then using a python script or something to perform the back test.


Random Final thought:-Ultimately would like to explore some AI assistance with optimizing strategies that were created based off parameters. I know this is a thing but not sure where to learn more about this. If you do please let me know.


Thank you guys. I hope this wasn't too much. If you can share any knowledge to increase my awareness on the topic I would really appreciate it.

Twitter:@b_gumm

Brandon G
  • 47
  • 7

1 Answers1

0

The amout of data is too much for EXCEL or CALC. Even if you want to screen only 500 Stocks from S&P 500, you will get 2,2 Millions of rows (approx. 220 days/year * 20 years * 500 stocks). For this amount of data, you should use a SQL Database like MySQL. It is performant enough to handle this amount of data. But you have to find a way for updating. If you get the complete time series daily and store it into your database, this process can take approx. 1 hour. You could also use delta downloads but be aware of corporate actions (e.g. splits).

I don't know Quantopia, but I know a similar backtesting service where I have created a python backtesting script last year. The outcome was quite different to what I have expected. The research result was that the backtesting service was calculating wrong results because of wrong data. So be cautious about the results.

Peter
  • 1,224
  • 3
  • 16
  • 28
  • Hey Peter thanks for you input. I agree with you on the SQL DB. -- I have since found that there are python libraries for this kind of thing. You just need a place to pull the data from, which I will use the interactive brokers python api. I am not positive this will work however. -- Quantopia is probably the next best option. – Brandon G Feb 13 '19 at 19:56