0

I am trying to build a softmax model using tensorflow on my image data, inspired from the MNIST example. When I am trying to train the model, I see that there is no reduction in loss. I also see that there is no change in parameter(W,b) values after first iteration. Do I need to explicitly update my parameter values after each iteration?

Code:-

######### Model Graph ###################
with tf.device('/cpu:0'):

x = tf.placeholder(tf.float32,shape = [None, IMAGE_HEIGHT, IMAGE_WIDTH, 3])
y_ = tf.placeholder(tf.float32,shape = [None,35])

########### Weight for each softmax sigmod function##############
initialW = tf.truncated_normal([IMAGE_HEIGHT*IMAGE_WIDTH*3, 35], stddev=0.1)
W = tf.Variable(initialW,trainable=True);
b = tf.Variable(tf.zeros([35]),trainable=True)


x_flat = tf.reshape(x, [-1,IMAGE_HEIGHT*IMAGE_WIDTH*3])
y=tf.nn.softmax(tf.matmul(x_flat,W)+b)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y+1e-10),reduction_indices=[1]))
cross_entropy = tf.Print(cross_entropy, [cross_entropy], "cost") #print to the console tensorflow

#train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
train_step = tf.train.AdamOptimizer(0.1).minimize(cross_entropy)

#### Model evaluation ######### Evaluating model
is_predicted_correctly = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(is_predicted_correctly,tf.float32))
ops = tf.initialize_all_variables();


### Running graph ###
### Initialzing variable ####
config = tf.ConfigProto()
config.log_device_placement=True
sess = tf.Session(config=config)
sess.run(ops)
###Training####
for it in range(nIterations):
  labels, images = d.getNextBatch(nBatchSize)
  while(images is not None):
    sess.run(train_step, feed_dict = {x: images, y_ : labels})
    labels, images = d.getNextBatch(nBatchSize)

Cost always remains similar:

I tensorflow/core/kernels/logging_ops.cc:79] cost[22.211819]
I tensorflow/core/kernels/logging_ops.cc:79] cost[22.095526]
I tensorflow/core/kernels/logging_ops.cc:79] cost[22.676987]
I tensorflow/core/kernels/logging_ops.cc:79] cost[22.563032]

Update: Code for batch size

def getNextBatch(self,cnt):
    if(self.dataSet is None):
        return None, None;

    if(self.curr>=len(self.dataSet)):
        return None, None

    end = self.curr+cnt;

    if(end>len(self.dataSet)):
        end = len(self.dataSet)

    batchData = self.dataSet[self.curr:end]
    labelRaw = [];
    images = [];
    for dataPoint in batchData:
        try:
            image = self.getImageFromPath(dataPoint['image']);
            if(not self.isSizeCorrect(image)):
                print("Wrong image shape:"+str(image.shape));
                raise ValueError("Wrong image shape");

            labelRaw.append(dataPoint['label']);
            images.append(image);
        except (OSError, ValueError):
            k=0;

    labels = self.onEnc.transform((self.lEnc.transform(labelRaw)).reshape(-1,1))
    self.curr = end

    return labels, np.array(images)

def getImageFromPath(self,imageFile):
    img = misc.imread(imageFile)
    resizedImg = misc.imresize(img,(IMAGE_HEIGHT,IMAGE_WIDTH))
    return resizedImg;
OmG
  • 18,337
  • 10
  • 57
  • 90
  • I want to add that I have tried all the things suggested in the answer of this [question](http://stackoverflow.com/questions/36127436/tensorflow-predicts-always-the-same-result?noredirect=1&lq=1). The problem exists even after that – user5911374 Aug 05 '16 at 23:08
  • I do not see where the variable `d` comes from. Are you sure the data is being fetched correctly? – Prophecies Aug 06 '16 at 15:58
  • Thanks for the reply. I am giving data as a numpy array. I verified that I am getting two numpy matrix 1-Size (batchsize,labels) and other of size(batchsize,imagesize). – user5911374 Aug 08 '16 at 17:20

1 Answers1

1

I was finally able to solve my problem. The problem was that my product of features and weights were large(in 10s of thousand) resulting in bloating up the values of exponents(imagine e^30000) in soft-max.

Because of this my gradients were always zero, hence no update for parameters.

I tried the following to resolve this problem:-

- Normalized my image data(pixel values from 0-to-255 to 0-to-1)
- Initialized parameter vectors with very small values around 10e-3
- Reduced the learning rate of my optimization algorithm. 

This causes the exponent to be small and non-zero gradient value. Finally was able to train the model.