How is Hard Sigmoid defined

Question

I am working on Deep Nets using keras. There is an activation "hard sigmoid". Whats its mathematical definition ?

I know what is Sigmoid. Someone asked similar question on Quora: https://www.quora.com/What-is-hard-sigmoid-in-artificial-neural-networks-Why-is-it-faster-than-standard-sigmoid-Are-there-any-disadvantages-over-the-standard-sigmoid

But I could not find the precise mathematical definition anywhere ?

score 13 · Answer 1 · edited May 03 '18 at 19:32

13

Since Keras supports both Tensorflow and Theano, the exact implementation might be different for each backend - I'll cover Theano only. For Theano backend Keras uses T.nnet.hard_sigmoid, which is in turn linearly approximated standard sigmoid:

slope = tensor.constant(0.2, dtype=out_dtype)
shift = tensor.constant(0.5, dtype=out_dtype)
x = (x * slope) + shift
x = tensor.clip(x, 0, 1)

i.e. it is: max(0, min(1, x*0.2 + 0.5))

edited May 03 '18 at 19:32

MBT

21,733
19
84
102

answered Feb 23 '16 at 14:51

Serj Zaharchenko

2,621
1
17
20

3

Keras' TensorFlow backend has the same math, though implemented by hand. https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L1487 – Dwight Crow Nov 05 '16 at 05:53

score 2 · Answer 2 · answered Sep 21 '16 at 20:14

For reference, the hard sigmoid function may be defined differently in different places. In Courbariaux et al. 2016 [1] it's defined as:

σ is the “hard sigmoid” function: σ(x) = clip((x + 1)/2, 0, 1) = max(0, min(1, (x + 1)/2))

The intent is to provide a probability value (hence constraining it to be between 0 and 1) for use in stochastic binarization of neural network parameters (e.g. weight, activation, gradient). You use the probability p = σ(x) returned from the hard sigmoid function to set the parameter x to +1 with p probability, or -1 with probability 1-p.

[1] https://arxiv.org/abs/1602.02830 - "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1", Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio, (Submitted on 9 Feb 2016 (v1), last revised 17 Mar 2016 (this version, v3))

score 1 · Answer 3 · answered Sep 13 '18 at 12:49

The hard sigmoid is normally a piecewise linear approximation of the logistic sigmoid function. Depending on what properties of the original sigmoid you want to keep, you can use a different approximation.

I personally like to keep the function correct at zero, i.e. σ(0) = 0.5 (shift) and σ'(0) = 0.25 (slope). This could be coded as follows

def hard_sigmoid(x):
    return np.maximum(0, np.minimum(1, (x + 2) / 4))

Anuj Gupta · Accepted Answer · 2018-09-04T04:27:18.210

-3

it is

  clip((x + 1)/2, 0, 1)

in coding parlance:

  max(0, min(1, (x + 1)/2))

edited Sep 04 '18 at 04:27

answered Feb 28 '18 at 13:18

Anuj Gupta

6,328
7
36
55

You should add some references and / or explanation to this. – Vivek Kumar May 16 '18 at 11:53

How is Hard Sigmoid defined

4 Answers4