16

I am currently trying to reproduce the results of the following article.
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
I am using Keras with the theano backend. In the article he talks about controlling the temperature of the final softmax layer to give different outputs.

Temperature. We can also play with the temperature of the Softmax during sampling. Decreasing the temperature from 1 to some lower number (e.g. 0.5) makes the RNN more confident, but also more conservative in its samples. Conversely, higher temperatures will give more diversity but at cost of more mistakes (e.g. spelling mistakes, etc). In particular, setting temperature very near zero will give the most likely thing that Paul Graham might say:

My model is as follows.

model = Sequential()
model.add(LSTM(128, batch_input_shape = (batch_size, 1, 256), stateful = True, return_sequences = True))
model.add(LSTM(128, stateful = True))
model.add(Dropout(0.1))
model.add(Dense(256, activation = 'softmax'))

model.compile(optimizer = Adam(),
              loss = 'categorical_crossentropy', 
              metrics = ['accuracy'])

The only way I can think to adjust the temperature of the final Dense layer would be to get the weight matrix and multiply it by the temperature. Does anyone know of a better way to do it? Also if anyone sees anything wrong with how I setup the model let me know since I am new to RNNs.

OmG
  • 18,337
  • 10
  • 57
  • 90
chasep255
  • 11,745
  • 8
  • 58
  • 115

3 Answers3

13

Well it looks like the temperature is something you do to the output of the softmax layer. I found this example.

https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

He applies the following function to sample the soft-max output.

def sample(a, temperature=1.0):
    # helper function to sample an index from a probability array
    a = np.log(a) / temperature
    a = np.exp(a) / np.sum(np.exp(a))
    return np.argmax(np.random.multinomial(1, a, 1))
chasep255
  • 11,745
  • 8
  • 58
  • 115
  • 3
    Is the last any different from `np.random.choice(len(a), p=a)`? – danijar Jun 09 '16 at 22:01
  • 2
    This is not the standard softmax with temperature as defined here:https://en.wikipedia.org/wiki/Softmax_function (in the reinforcement learning section). Why is there a log being applied before dividing by temperature? – A.D Feb 16 '17 at 19:41
  • @A.D the argument `a` is actually the softmax output of the network. So we use log to reverse the softmax operation and get logit-like values. These are the kind of values on which the temperature can be applied. This is consistent with [wikipedia](https://en.wikipedia.org/wiki/Softmax_function#Smooth_arg_max). – John Feb 09 '21 at 22:12
7

The answer from @chasep255 works ok but you will get warnings because of log(0). You can simplify the operation e^log(a)/T = a^(1/T) and get rid of the log

def sample(a, temperature=1.0):
  a = np.array(a)**(1/temperature)
  p_sum = a.sum()
  sample_temp = a/p_sum 
  return np.argmax(np.random.multinomial(1, sample_temp, 1))

Hope it helps!

Julian
  • 2,490
  • 24
  • 20
2

You can build your custom layer in keras to make temprature .

Code in keras will be like this and use this layer as any layer in keras like(Dense)

class Temperature(keras.layers.Layer):
  def __init__(self):
    super(Temperature, self).__init__()
    self.temperature = torch.nn.Parameter(torch.ones(1))
    
  def call(self, final_output):
    return final_output/ self.temperature
Rial ALi
  • 21
  • 3
  • 2
    You should use ``self.add_weight`` or ``tf.Variable``. In this example you are mixing Keras layers with torch parameters – Damian Grzanka Feb 14 '22 at 12:25
  • Yes you are right. – Rial ALi Oct 13 '22 at 13:49
  • It should be: 'class Temperature(keras.layers.Layer): def __init__(self): super(Temperature, self).__init__() self.temperature = tf.Variable( initial_value = [1.], trainable=True) # self.temperature = tf.ones(1) def call(self, final_output): return final_output/ self.temperature' – Rial ALi Oct 13 '22 at 13:50