0

Trying to traing a sparse autoencoder, but from the first epoch its loss showing 'nan'. train dataset is normalized between 0 and 1.

##cost function

prediction=decoder_res #output from decoder
actual=x

cost_MSE=tf.reduce_mean(tf.pow(actual-prediction,2))

#Weight_decay part

cost_regul=tf.reduce_sum(tf.square(W["W1"]))+tf.reduce_sum(tf.square(W["W2"]))

#sparsity cost
rho_j=tf.reduce_mean((encoder_res),axis=0)
print(rho_j.shape)

cost_sparse=tf.reduce_sum(sparse_param*tf.log(sparse_param/rho_j)+ (1- 
sparse_param)*tf.log((1-sparse_param)/(1-rho_j)))

#print(cost_sparse)

cost_fn=cost_MSE+(lamd/2)*cost_regul+beta*cost_sparse
#print(cost_fn.shape)

I have tried with different optimizers, but still it gives loss: nan

Network parametrs are given below:

#netwok parametrs
sparse_param=0.05
lamd = 0.05
beta=1
num_inputs=2048
num_h1=1000

Optimization:

optim=tf.train.GradientDescentOptimizer(learning_rate=l_r)
training=optim.minimize(cost_fn)

Code for Training:

saver=tf.train.Saver()

##initialize all the variables
init=tf.global_variables_initializer()

#start training session
loss_vector=[]
sess = tf.Session()
sess.run(init)
for epoch in range(total_epochs):
    epoch_loss=0
    i=0
while(i<len(final_data)): #final data is training dataset
    #print('we are inside the session now')
    start=i
    end=i+batch_size
    batch_x=np.array(final_data[start:end])
    sess.run(training,feed_dict={x:batch_x})
    loss,_=sess.run([cost_fn,training],feed_dict={x:batch_x})
    epoch_loss=epoch_loss+loss
    i=i+batch_size
loss_vector.append(epoch_loss)    
print('epoch',epoch+1,'is completed out of',total_epochs,'Loss::',epoch_loss)  

looking forward fot any kind of help. Thanks in advance :)

  • Have you tried using tf.Print to find the operation which first introduces NaNs into the training? – jan bernlöhr Jul 29 '18 at 15:48
  • No, I did not try that yet. I am not so much familiar with this command. So i am trying to find out which step introduces NAN – Hrid Biswas Jul 29 '18 at 18:41
  • As far i have observed, I am getting NaN's in encoded part. But I don't know, how can i fix it, Since Im using sigmoid function, tf.random_normal fuction for weights and bias initialization. It will be great help if somneone could tell me why am I getting NaN's in encoded image/layer – Hrid Biswas Jul 30 '18 at 00:05
  • You can have a look at this manual on how to use tf.Print (https://towardsdatascience.com/using-tf-print-in-tensorflow-aa26e1cff11e). Wrap every operation in your decoding part in a print. Then watch the output while running the training in which step exactly the first NaNs appear. – jan bernlöhr Jul 30 '18 at 06:19
  • @HridBiswas did you eventually solve this ? I am running have the same issue. – itzik Ben Shabat Sep 07 '18 at 14:54
  • hey sorry for late answer. I got NaN values in encoding part. Few hidden units output was always 1, as a result ' #sparsity cost rho_j=tf.reduce_mean((encoder_res),axis=0)', in rho_j vector I had 1 in few places, which gives log(1-1)= NAN, I re-write this rho_j following way, ' rho_j=tf.reduce_sum((encoder_res),axis=0)/batch_size+epsilon', epsilon some small value. – Hrid Biswas Sep 09 '18 at 00:21

0 Answers0