104

In this is tutorial code from TensorFlow website,

  1. could anyone help explain what does global_step mean?

    I found on the Tensorflow website written that global step is used count training steps, but I don't quite get what exactly it means.

  2. Also, what does the number 0 mean when setting up global_step?

    def training(loss,learning_rate):
        tf.summary.scalar('loss',loss)
        optimizer = tf.train.GradientDescentOptimizer(learning_rate)
        
        # Why 0 as the first parameter of the global_step tf.Variable?
        global_step = tf.Variable(0, name='global_step',trainable=False)

        train_op = optimizer.minimize(loss, global_step=global_step)
    
        return train_op

According to Tensorflow doc global_step: increment by one after the variables have been updated. Does that mean after one update global_step becomes 1?

Community
  • 1
  • 1
GabrielChu
  • 6,026
  • 10
  • 27
  • 42

4 Answers4

124

global_step refers to the number of batches seen by the graph. Every time a batch is provided, the weights are updated in the direction that minimizes the loss. global_step just keeps track of the number of batches seen so far. When it is passed in the minimize() argument list, the variable is increased by one. Have a look at optimizer.minimize().

You can get the global_step value using tf.train.global_step(). Also handy are the utility methods tf.train.get_global_step or tf.train.get_or_create_global_step.

0 is the initial value of the global step in this context.

patzm
  • 973
  • 11
  • 23
martianwars
  • 6,380
  • 5
  • 35
  • 44
  • Thanks! In the link you provided, `tf.train.global_step()`, the `global_step_tensor` is set to 10. Does that mean 10 batches are already seen by the graph? – GabrielChu Dec 17 '16 at 04:50
  • 10
    @martianwars, I still don't get the point of having global_step. Isn't the looping for batches driven by the python program itself, so the python program can easily know how many batches have been done. Why bother to have the tensorflow to maintains such a counter? – victorx Feb 20 '17 at 10:37
  • 1
    optimizers varys it's constants based on global step @xwk – martianwars Feb 20 '17 at 10:38
  • 26
    to answer xwk's question, I think if you stop training after 100 iterations, and next day restore the model and run another 100 iterations. Now your global step is 200, but the second run have a local iteration number from 1 to 100, which is local to that run, versus the global iteration step. So the global step record the total number of iterations, maybe used for changing learning rate or other hyperparameter. – Wei Liu Feb 22 '17 at 19:10
  • 9
    to build on Wei Liu's answer, global steps are also useful for tracking progress of distributed TensorFlow jobs. As workers concurrently see batches, there needs to be a mechanism to track the total number of batches seen. This is the way [StopAtStepHook](https://www.tensorflow.org/api_docs/python/tf/train/StopAtStepHook) operate for example. – Malo Marrec Dec 29 '17 at 12:06
  • To build upon Malo’s answer, today people tend to use learning rate scheduler to progressively modify learning rate as the training iterations goes up. Many schedulers depend on the current epoch number, but some might also depend on the current batch index. For instance the cyclic scheduler has been used to modify learning rate after each batch. In this case, the global step provides a way for the scheduler to know which batch the model is currently seeing. – Jonathan Lee Apr 18 '19 at 13:36
  • Just a quick question, if I have a batch size of 64, dataset size of 640 and the iterations take place 10 times, does that mean global step will have the last counter value as **64 x (640/64) x 10**, is it so? – Rishik Mani Sep 03 '19 at 11:03
  • Is there a way I can get the `global_step` inside the context of a `Dataset` pipeline - before the `global_step` has been created? See https://stackoverflow.com/questions/60882387/how-to-get-current-global-step-in-data-pipeline – Stefan Falk Mar 27 '20 at 13:29
5

The global_step Variable holds the total number of steps during training across the tasks (each step index will occur only on a single task).

A timeline created by global_step helps us understand know where we are in the grand scheme, from each of the tasks separately. For instance, the loss and accuracy could be plotted against global_step on Tensorboard.

nbro
  • 15,395
  • 32
  • 113
  • 196
envy_intelligence
  • 453
  • 1
  • 6
  • 21
4

show you a vivid sample below:

code:

train_op = tf.train.GradientDescentOptimizer(learning_rate=LEARNING_RATE).minimize(loss_tensor,global_step=tf.train.create_global_step())
with tf.Session() as sess:
    ...
    tf.logging.log_every_n(tf.logging.INFO,"np.mean(loss_evl)= %f at step %d",100,np.mean(loss_evl),sess.run(tf.train.get_global_step()))

corresponding print

INFO:tensorflow:np.mean(loss_evl)= 1.396970 at step 1
INFO:tensorflow:np.mean(loss_evl)= 1.221397 at step 101
INFO:tensorflow:np.mean(loss_evl)= 1.061688 at step 201
Stefan Falk
  • 23,898
  • 50
  • 191
  • 378
yichudu
  • 165
  • 2
  • 12
3

There are networks, e.g. GANs, that may need two (or more) different steps. Training a GANs with the WGAN specification requires that the steps on the discriminator (or critic) D are more than the ones done on the generator G. In that case, it is usefull to declare different global_steps variables.

Example: (G_lossand D_loss are the loss of the generator and the discriminator)

G_global_step = tf.Variable(0, name='G_global_step', trainable=False)
D_global_step = tf.Variable(0, name='D_global_step', trainable=False)

minimizer = tf.train.RMSPropOptimizer(learning_rate=0.00005)

G_solver = minimizer.minimize(G_loss, var_list=params, global_step=G_global_step)
D_solver = minimizer.minimize(D_loss, var_list=params, global_step=D_global_step)
Luca Di Liello
  • 1,486
  • 2
  • 17
  • 34