When I using tensorflow, the loss suddenly turn into nan, just like:
Epoch: 00001 || cost= 0.675003929
Epoch: 00002 || cost= 0.237375346
Epoch: 00003 || cost= 0.204962473
Epoch: 00004 || cost= 0.191322120
Epoch: 00005 || cost= 0.181427178
Epoch: 00006 || cost= 0.172107664
Epoch: 00007 || cost= 0.171604740
Epoch: 00008 || cost= 0.160334495
Epoch: 00009 || cost= 0.151639721
Epoch: 00010 || cost= 0.149983061
Epoch: 00011 || cost= 0.145890004
Epoch: 00012 || cost= 0.141182279
Epoch: 00013 || cost= 0.140914166
Epoch: 00014 || cost= 0.136189088
Epoch: 00015 || cost= 0.133215346
Epoch: 00016 || cost= 0.130046664
Epoch: 00017 || cost= 0.128267926
Epoch: 00018 || cost= 0.125328618
Epoch: 00019 || cost= 0.125053261
Epoch: 00020 || cost= nan
Epoch: 00021 || cost= nan
Epoch: 00022 || cost= nan
Epoch: 00023 || cost= nan
Epoch: 00024 || cost= nan
Epoch: 00025 || cost= nan
Epoch: 00026 || cost= nan
Epoch: 00027 || cost= nan
And the main training code is:
for epoch in range(1000):
Mcost = 0
temp = []
for i in range(total_batch):
batch_X = X[i*batch_size:(i+1)*batch_size]
batch_Y = Y[i*batch_size:(i+1)*batch_size]
solver, c, pY = sess.run([train, cost, y_conv], feed_dict={x: batch_X, y_: batch_Y, keep_prob:0.8})
Mcost = Mcost + c
print("Epoch: ", '%05d'%(epoch+1), "|| cost=",'{:.9f}'.format(Mcost/total_batch))
Since the cost is OK at the first 19 epoch, I believe that the network and the input is OK. For the network, I use 4 CNN, the activate function is relu, and the last layer is full connect without the activate function.
Also, I have known that 0/0 or log(0) will result in nan. But, my loss function is:
c1 = y_conv - y_
c2 = tf.square(c1)
c3 = tf.reduce_sum(c2,1)
c4 = tf.sqrt(c3)
cost = tf.reduce_mean(c4)
I run the tensorflow with GPU GTX 1080.
Any suggestion is appreciate.