I found that the derivatives of the common activation functions are ranged in [0,1]. https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html
It is the cause of gradient vanishing in RNN.
What is the reason that the derivatives are kept in [0,1] when activation functions were firstly introduced to deep learning? What will happen to a MLP if we use a variation of Relu, such as f(x) = max(0, 2x) with the derivative ranged in [0,2]