0

I am working on a prediction model for stock returns over a fixed period of time (say n days). I am was hoping to gather a few ideas ahead of time. My questions are:

1) Would it be best to turn this into a classification problem, say create a dummy variable with returns larger than x%? Then I could try the entire arsenal of ML Algorithms.

2) If I don't turn it into a classification problem but use say a regression model, would it make sense or be necessary to transform the returns into logs?

Any thoughts are appreciated.

EDIT: My goal with this is relatively broadly defined, in the sense that I would simple like to improve performance of the selection process (pick positive returns and avoid negative ones)

Niccola Tartaglia
  • 1,537
  • 2
  • 26
  • 40

1 Answers1

1
  1. Best under what quality? Turning it into a thresholding problem simply means translating the problem space to a much simpler one. Your problem definition is your own; you can turn it into a binary classification problem (>x or not), a multi-class classification problem (binning into ranges) or simply keep it as a prediction task. If you do the latter, you can still apply binning or classification as a post-processing step.
  2. Classification is just a subclass of prediction. The log transformation employed by logistic regression is no more than a neat trick to turn the outputs into something that resembles a probability distribution; don't put too much thought into it. That said, applying transformations on your output is not necessarily bad (you could for instance apply some normalization to keep your output within the range of some activation function).
KonstantinosKokos
  • 3,369
  • 1
  • 11
  • 21
  • Thank you for sharing some insight. That makes sense. My goal is relatively broadly defined, so anything the improves that ability of picking winners vs losers will be helpful. – Niccola Tartaglia Apr 28 '18 at 16:10
  • You could try both approaches; changing from N-class classification to prediction boils down to simply changing the output layer and the loss function on any high-level ML library such as keras. Intuitively, the prediction space is richer in information than the classification space (stock>50 returns (1,) for both stock=51 and stock=5*10e10, but the MSE between 50 and 5*10e10 is huge), so your model would benefit more from prediction, assuming you have enough data. That said, keep in mind that stock prediction isn't really something you can model easily as the underlying systems are chaotic. – KonstantinosKokos Apr 28 '18 at 16:15
  • Good points. Yeah, I agree. I will try both approaches and see how they perform. Yeah, the stock price prediction is indeed very difficult to model I agree. – Niccola Tartaglia Apr 28 '18 at 16:38