What does global_step mean in Tensorflow?

Question

In this is tutorial code from TensorFlow website,

could anyone help explain what does global_step mean?

I found on the Tensorflow website written that global step is used count training steps, but I don't quite get what exactly it means.
Also, what does the number 0 mean when setting up global_step?

    def training(loss,learning_rate):
        tf.summary.scalar('loss',loss)
        optimizer = tf.train.GradientDescentOptimizer(learning_rate)
        
        # Why 0 as the first parameter of the global_step tf.Variable?
        global_step = tf.Variable(0, name='global_step',trainable=False)

        train_op = optimizer.minimize(loss, global_step=global_step)
    
        return train_op

According to Tensorflow doc global_step: increment by one after the variables have been updated. Does that mean after one update global_step becomes 1?

score 124 · Accepted Answer · edited Jul 13 '18 at 13:06

124

global_step refers to the number of batches seen by the graph. Every time a batch is provided, the weights are updated in the direction that minimizes the loss. global_step just keeps track of the number of batches seen so far. When it is passed in the minimize() argument list, the variable is increased by one. Have a look at optimizer.minimize().

You can get the global_step value using tf.train.global_step(). Also handy are the utility methods tf.train.get_global_step or tf.train.get_or_create_global_step.

0 is the initial value of the global step in this context.

edited Jul 13 '18 at 13:06

patzm

973
11
23

answered Dec 15 '16 at 15:11

martianwars

6,380
5
35
44

Thanks! In the link you provided, `tf.train.global_step()`, the `global_step_tensor` is set to 10. Does that mean 10 batches are already seen by the graph? – GabrielChu Dec 17 '16 at 04:50
10

@martianwars, I still don't get the point of having global_step. Isn't the looping for batches driven by the python program itself, so the python program can easily know how many batches have been done. Why bother to have the tensorflow to maintains such a counter? – victorx Feb 20 '17 at 10:37
1

optimizers varys it's constants based on global step @xwk – martianwars Feb 20 '17 at 10:38
26

to answer xwk's question, I think if you stop training after 100 iterations, and next day restore the model and run another 100 iterations. Now your global step is 200, but the second run have a local iteration number from 1 to 100, which is local to that run, versus the global iteration step. So the global step record the total number of iterations, maybe used for changing learning rate or other hyperparameter. – Wei Liu Feb 22 '17 at 19:10
9

to build on Wei Liu's answer, global steps are also useful for tracking progress of distributed TensorFlow jobs. As workers concurrently see batches, there needs to be a mechanism to track the total number of batches seen. This is the way [StopAtStepHook](https://www.tensorflow.org/api_docs/python/tf/train/StopAtStepHook) operate for example. – Malo Marrec Dec 29 '17 at 12:06
To build upon Malo’s answer, today people tend to use learning rate scheduler to progressively modify learning rate as the training iterations goes up. Many schedulers depend on the current epoch number, but some might also depend on the current batch index. For instance the cyclic scheduler has been used to modify learning rate after each batch. In this case, the global step provides a way for the scheduler to know which batch the model is currently seeing. – Jonathan Lee Apr 18 '19 at 13:36
Just a quick question, if I have a batch size of 64, dataset size of 640 and the iterations take place 10 times, does that mean global step will have the last counter value as **64 x (640/64) x 10**, is it so? – Rishik Mani Sep 03 '19 at 11:03
Is there a way I can get the `global_step` inside the context of a `Dataset` pipeline - before the `global_step` has been created? See https://stackoverflow.com/questions/60882387/how-to-get-current-global-step-in-data-pipeline – Stefan Falk Mar 27 '20 at 13:29

score 5 · Answer 2 · edited Jan 07 '18 at 14:12

5

The global_step Variable holds the total number of steps during training across the tasks (each step index will occur only on a single task).

A timeline created by global_step helps us understand know where we are in the grand scheme, from each of the tasks separately. For instance, the loss and accuracy could be plotted against global_step on Tensorboard.

edited Jan 07 '18 at 14:12

nbro

15,395
32
113
196

answered Oct 02 '17 at 19:10

envy_intelligence

453
1
6
21

score 4 · Answer 3 · edited Oct 01 '18 at 08:19

show you a vivid sample below:

code:

train_op = tf.train.GradientDescentOptimizer(learning_rate=LEARNING_RATE).minimize(loss_tensor,global_step=tf.train.create_global_step())
with tf.Session() as sess:
    ...
    tf.logging.log_every_n(tf.logging.INFO,"np.mean(loss_evl)= %f at step %d",100,np.mean(loss_evl),sess.run(tf.train.get_global_step()))

corresponding print

INFO:tensorflow:np.mean(loss_evl)= 1.396970 at step 1
INFO:tensorflow:np.mean(loss_evl)= 1.221397 at step 101
INFO:tensorflow:np.mean(loss_evl)= 1.061688 at step 201

score 3 · Answer 4 · answered Oct 03 '19 at 11:27

There are networks, e.g. GANs, that may need two (or more) different steps. Training a GANs with the WGAN specification requires that the steps on the discriminator (or critic) D are more than the ones done on the generator G. In that case, it is usefull to declare different global_steps variables.

Example: (G_lossand D_loss are the loss of the generator and the discriminator)

G_global_step = tf.Variable(0, name='G_global_step', trainable=False)
D_global_step = tf.Variable(0, name='D_global_step', trainable=False)

minimizer = tf.train.RMSPropOptimizer(learning_rate=0.00005)

G_solver = minimizer.minimize(G_loss, var_list=params, global_step=G_global_step)
D_solver = minimizer.minimize(D_loss, var_list=params, global_step=D_global_step)

What does global_step mean in Tensorflow?

4 Answers4

code:

corresponding print

Linked