Anomaly Testing - Linear Regression with t or not with t? Problems to understand the setup

Question

If you want to check an anomaly in stock data many studies use a linear regression. Let's say you want to check if there is a Monday effect, meaning that monday is significantly worse than other days. I understood that we can use a regression like: return = a + b DummyMon + e a is the constant, b the regression coefficient, we have the Dummy for Monday and the error term e. That's what I used in python: First you add a constant to the anomaly:

anomaly = sm.add_constant(anomaly)

Then you build the model:

model = sm.OLS(return, anomaly)

The you fit the model:

results = model.fit()

I wonder if this is the correct model setup.
In this case a plot of the linear regression would just show two vertical areas above 0 (for no Monday) and 1 (for Monday) with all the returns. It looks pretty strange. Is this correct?
Should I somehow try to use the time (t) in the regression? If so, how can I do it with python? I thought about giving each date an increasing number, but then I wondered how to treat weekends.
I would assume that with many data points both approaches are similar, if the time series is stationary, right? In the end I do a cross section anaylsis and don't care about the aspect of the time series in this case, correct? ( I heard about GARCH models etc, where this is a different)

Well, I am just learning and hope someone could give me some ideas about the topic. Thank you very much in advance.

score 0 · Answer 1 · answered Nov 08 '20 at 00:01

For time series analysis tasks (such as forecasting or anomaly detection), you may need a more advanced model, such as Recurrent Neural Networks (RNN) in deep learning. You can assign any time step to an RNN Cell, in your case, every RNN Cell can represent a day or maybe an hour or half a day etc.

The main purpose of the RNNs is to make the model understand the time dependencies in the data. For example, if monday has a bad affect, then corresponding RNN Cells will have reasonable parameters. I would recommend you to do some further research about it. Here there are some documentations that may help:

https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (Also includes different types of RNN)

https://towardsdatascience.com/understanding-rnn-and-lstm-f7cdf6dfc14e

And you can use tensorflow, keras or PyTorch libraries.

Thank you very much for your comment. I will dig more into Neural Newtworks and Random Forests for sure later. Not sure if you can really use it for detecting the January effect. Do you have reference studies for the January effect or the monday effect? I did not find one. There are tons of studies with linear regressions and I need to do linear regressions for the current project though. But I really appreciate your answer. Thank you. Can anybody anwser my questions maybe? — Poldi, Nov 08 '20 at 01:23

Anomaly Testing - Linear Regression with t or not with t? Problems to understand the setup

1 Answers1