10

I am working on Deep Nets using keras. There is an activation "hard sigmoid". Whats its mathematical definition ?

I know what is Sigmoid. Someone asked similar question on Quora: https://www.quora.com/What-is-hard-sigmoid-in-artificial-neural-networks-Why-is-it-faster-than-standard-sigmoid-Are-there-any-disadvantages-over-the-standard-sigmoid

But I could not find the precise mathematical definition anywhere ?

Abhishek Tyagi
  • 364
  • 4
  • 17
Anuj Gupta
  • 6,328
  • 7
  • 36
  • 55

4 Answers4

13

Since Keras supports both Tensorflow and Theano, the exact implementation might be different for each backend - I'll cover Theano only. For Theano backend Keras uses T.nnet.hard_sigmoid, which is in turn linearly approximated standard sigmoid:

slope = tensor.constant(0.2, dtype=out_dtype)
shift = tensor.constant(0.5, dtype=out_dtype)
x = (x * slope) + shift
x = tensor.clip(x, 0, 1)

i.e. it is: max(0, min(1, x*0.2 + 0.5))

MBT
  • 21,733
  • 19
  • 84
  • 102
Serj Zaharchenko
  • 2,621
  • 1
  • 17
  • 20
  • 3
    Keras' TensorFlow backend has the same math, though implemented by hand. https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L1487 – Dwight Crow Nov 05 '16 at 05:53
2

For reference, the hard sigmoid function may be defined differently in different places. In Courbariaux et al. 2016 [1] it's defined as:

σ is the “hard sigmoid” function: σ(x) = clip((x + 1)/2, 0, 1) = max(0, min(1, (x + 1)/2))

The intent is to provide a probability value (hence constraining it to be between 0 and 1) for use in stochastic binarization of neural network parameters (e.g. weight, activation, gradient). You use the probability p = σ(x) returned from the hard sigmoid function to set the parameter x to +1 with p probability, or -1 with probability 1-p.

[1] https://arxiv.org/abs/1602.02830 - "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1", Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio, (Submitted on 9 Feb 2016 (v1), last revised 17 Mar 2016 (this version, v3))

phoenixdown
  • 828
  • 1
  • 10
  • 16
1

The hard sigmoid is normally a piecewise linear approximation of the logistic sigmoid function. Depending on what properties of the original sigmoid you want to keep, you can use a different approximation.

I personally like to keep the function correct at zero, i.e. σ(0) = 0.5 (shift) and σ'(0) = 0.25 (slope). This could be coded as follows

def hard_sigmoid(x):
    return np.maximum(0, np.minimum(1, (x + 2) / 4))
-3

it is

  clip((x + 1)/2, 0, 1) 

in coding parlance:

  max(0, min(1, (x + 1)/2)) 
Anuj Gupta
  • 6,328
  • 7
  • 36
  • 55