Optimizer doesn't consistently improve training accuracy

Question

I'm trying to train a logistic regression model, but no matter how small I make the training set, the training accuracy does not increase consistently. I've shrunk the training set to just 3 examples, and the model sometimes starts at 66.66% training accuracy and ends with 33.33%. Other times it starts with 0% and ends with 66.66%. It never reaches 100% accuracy. It has the same behavior with training sets of size 32, 200, and 400, with starting accuracy around 50% and ending accuracy between 40% and 60%.

The model code is as follows:

def get_batch(index, tensors, batch_size, nItems):
    xs, ys = tensors
    begin = index * batch_size
    end = min((index+1)*batch_size, nItems)
    y_b = ys[begin:end]

    (inds, vals, dsize) = xs
    nInds = inds[(begin <= inds[:,0]) & (inds[:,0] < end)] - np.array([begin, 0])
    nVals = vals[:nInds.shape[0]]
    nDsize = (end - begin, dsize[1])
    x_b = tf.SparseTensorValue(nInds, nVals, nDsize)
    return (x_b, y_b)

class OneLayerNet(object):
    def __init__(self, num_feats, num_outputs):
        self.batch_size = 3
        self.epochs = 100
        self.eta = 0.01
        self.reg_const = 0

        self.x = tf.sparse_placeholder(tf.float64, name="placeholderx") # num_sents x num_feats
        self.y = tf.placeholder(tf.float64, name="placeholdery") # 1 x num_sents
        self.w = tf.Variable(tf.random_normal([num_feats, num_outputs], stddev=0.01, dtype=tf.float64)) # num_feats x 1
        self.b = tf.Variable(tf.zeros([num_outputs], dtype=tf.float64))

        self.wx = tf.sparse_tensor_dense_matmul(self.x, self.w)
        self.scores = tf.add(self.wx, self.b)
        self.probs = 1 / (1 + tf.exp(-self.scores))
        self.probs = tf.clip_by_value(self.probs, 0.001, .999)
        self.loss_vect = self.y*tf.log(self.probs) + (1-self.y)*tf.log(1-self.probs)
        self.loss = -tf.reduce_mean(self.loss_vect) # + self.reg_const/2 * tf.square(tf.norm(self.w))
        self.optimizer = tf.train.AdamOptimizer(learning_rate=self.eta).minimize(self.loss)
        self.session = tf.Session()
        self.session.run(tf.global_variables_initializer())

    def train(self, x, y, loss_graph_file):
        session = self.session
        num_batches = y.shape[0] // self.batch_size
        loss_vect = []

        for epoch in range(self.epochs):
            avg_loss = 0
            for i in range(num_batches):
                batch_x, batch_y = get_batch(i, [x, y], self.batch_size, y.shape[0])
                _, loss, w = session.run([self.optimizer, self.loss, self.w], {self.x: batch_x, self.y: batch_y})
                avg_loss += loss/num_batches

            loss_vect.append(avg_loss)
            if epoch % 10 == 0 or epoch == self.epochs-1:
                print("Epoch {}: loss = {}".format(epoch, avg_loss))
                print("Weights: {}".format(w))

        plt.plot(loss_vect)
        plt.ylabel('Loss')
        plt.xlabel('Epoch')
        plt.savefig(loss_graph_file)

    def eval(self, x, y, predictions_file):
        session = self.session
        num_batches = y.shape[0] // self.batch_size
        num_correct = 0

        with open(predictions_file, 'w') as f:
            for i in range(num_batches + 1):
                batch_x, batch_y = get_batch(i, [x, y], self.batch_size, y.shape[0])
                probs = session.run(self.probs, {self.x: batch_x})
                predictions = np.transpose(probs >= 0.5)[0]
                num_correct += np.sum(np.equal(predictions, batch_y))
                for j in range(batch_y.shape[0]):
                    f.write('{}\t{}\t{}\n'.format(probs[j], int(predictions[j]), batch_y[j]))

        accuracy = num_correct/len(y)
        return accuracy

I've tried the suggestions in this answer, but the behavior remains the same. I'm using Tensorflow 1.5.0.

UPDATE I printed out the softmax output for each sentence, and each of them becomes closer and closer to 50%. I tried using my setup to learn the AND function. Turns out the weights become closer and closer to 0 as it trains.

Epoch 0: loss = 4.133313990920284
Weights: [[-0.59451162]
 [ 0.55122256]]
Bias: [-0.01]
Epoch 100: loss = 3.0849339200727615
Weights: [[-0.70471682]
 [-0.04904535]]
Bias: [-0.63568272]
Epoch 200: loss = 3.0166726382814177
Weights: [[-0.2748711 ]
 [-0.13774631]]
Bias: [-0.834027]
Epoch 300: loss = 3.004324396806258
Weights: [[-0.108655 ]
 [-0.1161173]]
Bias: [-0.95526422]
Epoch 400: loss = 3.0011826475632546
Weights: [[-0.04740128]
 [-0.06981994]]
Bias: [-1.02420669]
Epoch 500: loss = 3.0002812775795973
Weights: [[-0.02161358]
 [-0.03521941]]
Bias: [-1.06242562]
Epoch 600: loss = 3.0000558857071757
Weights: [[-0.0094973 ]
 [-0.01578322]]
Bias: [-1.08245493]
Epoch 700: loss = 3.00000916752074
Weights: [[-0.00384123]
 [-0.00638793]]
Bias: [-1.09205959]
Epoch 800: loss = 3.0000012291196088
Weights: [[-0.00140626]
 [-0.00233578]]
Bias: [-1.09621262]
Epoch 900: loss = 3.000000133321497
Weights: [[-0.00046284]
 [-0.00076831]]
Bias: [-1.09782245]
Epoch 1000: loss = 3.0000000115763847
Weights: [[-0.00013625]
 [-0.00022613]]
Bias: [-1.09837977]
Epoch 1100: loss = 3.0000000007953758
Weights: [[-3.56729609e-05]
 [-5.91996755e-05]]
Bias: [-1.09855141]
Epoch 1200: loss = 3.0000000000426725
Weights: [[-8.25235844e-06]
 [-1.36946603e-05]]
Bias: [-1.09859821]
Epoch 1300: loss = 3.0000000000017604
Weights: [[-1.67385710e-06]
 [-2.77772968e-06]]
Bias: [-1.09860943]
Epoch 1400: loss = 3.000000000000055
Weights: [[-2.95008595e-07]
 [-4.89560038e-07]]
Bias: [-1.09861179]
Epoch 1500: loss = 3.000000000000001
Weights: [[-4.46992207e-08]
 [-7.41773288e-08]]
Bias: [-1.09861221]
Epoch 1600: loss = 3.0
Weights: [[-5.74942725e-09]
 [-9.54104229e-09]]
Bias: [-1.09861228]
Epoch 1700: loss = 3.0
Weights: [[-6.18335872e-10]
 [-1.02611408e-09]]
Bias: [-1.09861229]
Epoch 1800: loss = 3.0
Weights: [[-5.45849278e-11]
 [-9.05823934e-11]]
Bias: [-1.09861229]
Epoch 1900: loss = 3.0
Weights: [[-3.86521516e-12]
 [-6.41401071e-12]]
Bias: [-1.09861229]
Epoch 1999: loss = 3.0
Weights: [[-2.19714497e-13]
 [-3.64640086e-13]]
Bias: [-1.09861229]

score 0 · Answer 1 · answered May 28 '18 at 05:24

0

I'll suggest you try once with xavier's initialization of weights. It's something like

W = tf.get_variable("W", 
                    shape=[x,y],               
                    initializer=tf.contrib.layers.xavier_initializer())

where you know x and y are shape of layers.

answered May 28 '18 at 05:24

shivam13juna

329
4
11

With Xavier weights, accuracy on the training set of size 3 consistently improves from 33.33% to 66.66%. However, accuracy on the training set of size 32 still doesn't consistently improve. – npCompleteNoob May 28 '18 at 12:07
what exactly do you mean it doesn't consistenly improve? you mean with every epoch? – shivam13juna May 28 '18 at 13:17
It doesn't consistently improve after 100 epochs. Sometimes I start with above 50% training accuracy, and after 100 epochs, I end with 40-something percent accuracy. – npCompleteNoob May 28 '18 at 19:06

Optimizer doesn't consistently improve training accuracy

1 Answers1