1

I have been looking at reasearch papers that attempt to predict stock price. I have noticed in these papers that the activation function is applied to the output using one of the following types of activation function. Unipolar sigmoid, Bipolar sigmoid, Tan hyperbolic, Radial basis function.

My question If one of the above types of activation function is applied to the output then how can it be used to predict stock price i.e. a value like $103.56? Because most of these functions have min or max values between (0,1) or (-1,1).

Reply to bakkal Before being put as input into the ANN, the inputs were normalized according to the ‘zscore’ function defined in MATLAB, wherein the mean was subtracted and the value divided by the variance of the data. The target outputs were also normalized according to the target functions, dividing by their maximum values, keeping in mind the upper and lower limits for the respective activation functions ((0,1) for unipolar sigmoid, (-1, 1) for the bipolar sigmoid and the tan hyperbolic functions).

Hi , as mentioned below if the activation function is not applied to the output then could someone explain the paragraph in bold, thanks.

bakkal
  • 54,350
  • 12
  • 131
  • 107
edb500
  • 313
  • 1
  • 3
  • 14

2 Answers2

3

If you're looking for a continuous output like 103.56, then you're using the neural network to implement a regression (as opposed to a classification). In that case you wouldn't apply an activation layer on the output. Your output would be the sum of the weighted inputs from the previous layer.

That said, nothing stops you from using activation layers on the hidden layers in the network (e.g. to create intermediate features, that are then used for the regression)

Why doesn't the use of an activation function act like a normalisation function? So do we need to normalise if we are using an activation function? Because the activation function will act like a normaliser?

Normalization

Well not exactly, feature normalization is e.g. taking all your historical stock prices data, finding the max, the min, std dev etc, and apply a transformation so that all that historical data fits into e.g. [0, 1].

Why do this? Because your historical data may have prices from AMZN that can go up to say $500, but it's market cap is say $200 billion. That's a lot of zeros in difference between the two features price and market cap, that's not good for some numerical algorithms. So what you do is normalize those into some standardized scale, so that all prices be between [0, 1], and that all market caps be [0, 1]. E.g. this helps the backpropagation algorithim.

Activation

Now the activation functions does a different thing, it's there so as to create an effect of activation, as in a neuron either fires or doesn't fire. The activation function takes an input say [-inf, +inf] and tries to snap it into say [-1, +1]. That's different from normalizing

Now how can the activation effect help with regression? Well e.g. in stocks, predicting prices in penny stocks (say ~4 million USD company) can be wildly different from predicting prices in blue chips (~200 billion USD companies), so you may want to have a feature that turns on/off based on penny/large cap. That feature can be used to better do the regression of the predicted price.

bakkal
  • 54,350
  • 12
  • 131
  • 107
  • Thank you for the reply. – edb500 Jun 21 '16 at 09:26
  • please see my editied question, in the research paper it looks as if the activation function is being applied to the output – edb500 Jun 21 '16 at 09:31
  • Ok see the link above – edb500 Jun 21 '16 at 09:36
  • @edb500 Hi there! I took a look, not sure why he's normalizing with respect to the activation function. Usually you'd normalize with respect to the input data (not the activation fn output). Other than that, the normalization step is simply because it can improve the backpropagation algorithm performance. – bakkal Jun 21 '16 at 11:30
  • Thanks for checking. Another question, why doesnt the use of an activation function act like a normalisation function? So do we need to normalise if we are using an activation function? Because the activation function will act like a normaliser? – edb500 Jun 21 '16 at 11:47
  • @edb500 Added an explanation in the answer – bakkal Jun 21 '16 at 11:57
  • Hey, I have another ML question on http://stackoverflow.com/questions/38010806/r-programming-neural-network-package and was wondering if you could help? its regarding the implementation of the system. – edb500 Jun 24 '16 at 10:25
0

We used normalization to map the target values to range (0, 1) or (-1, 1) or whatever you want according to your activation function. Generally, we also map the input values to a range near to (-1, 1). The most frequently used normalization to scale the input values is Gaussian normalization. If the input vector is x and if you are working with numpy arrays, then the following is the gaussian normalization of x:

xScaled = (x-x.mean())/(x.std())

where mean() gives the average and std() gives standard deviation.

Another normalization is:

xScaled = (x-x.min())/(x.max()-x.min())

which scales the input vector values to the range (0,1).

So, you work with normalized input and output values in order to fasten the learning process. You can also refer to Andrew Ng course as to know why this happens. When you want to scale the normalized values back to their actual values, you can use reverse normalization. For example, for the above (0,1) normalization, the reverse normalization would be:

x = x.min() + (x.max()-x.min())*xScaled

You can similarly obtain the reverse normalization for the Gaussian case.

Nagabhushan Baddi
  • 1,164
  • 10
  • 18
  • Thanks for the great reply, however there are two things at work here. Normalisation of the data adn then an activation function being applied ontop of the normalised data. So the output would need to remove the activation function and the normalisation , correct? – edb500 Jun 21 '16 at 09:54
  • No need to remove the activation function. Only do the reverse scaling and this would work. – Nagabhushan Baddi Jun 21 '16 at 09:58
  • Hmm ok thanks, so above when the article says "Target Values where divided by their maximum values, does he mean maximum activation function values? or maximum prices values? – edb500 Jun 21 '16 at 10:07
  • he actually means maximum price values. – Nagabhushan Baddi Jun 21 '16 at 10:08
  • Ah I see, so the best thing to do is use Gaussian normalization rather then his method. – edb500 Jun 21 '16 at 10:12
  • @ Nagabhushan Baddi Another question, why doesnt the use of an activation function act like a normalisation function? So do we need to normalise if we are using an activation function? Because the activation function will act like a normaliser? – edb500 Jun 21 '16 at 11:48
  • Hey, I have another ML question on http://stackoverflow.com/questions/38010806/r-programming-neural-network-package and was wondering if you could help? its regarding the implementation of the system. – edb500 Jun 24 '16 at 10:26