Why is my CNN model not learning anything? (tensorflow)

Question

I've been trying to make an image classifier base on this tutorial

I have changed a few things in it, which I didn't think would completely break the learning pattern but they did... The accuracy calculations are consistently about as good as random guesses, with no upward curve, not even a slow one.

As you could see in the code, the only major changes I made was using my own data-set instead of the MNIST data-set, and my data is pictures of characters of 89 classes, not just 10. I apply random noise and distortions before batching them to the trainer, and I have about 100,000 examples of each of the 89 classes.

I have commented next to the only changed lines to help you read the changes

Code (python):

def model(features, labels, mode):

    #Change from the tutorial - 36*36 images instead of 28*28
    input_layer = tf.reshape(features, [-1, 36, 36, 1])

    conv1 = tf.layers.conv2d(inputs=input_layer, filters=32, kernel_size=[5, 5], padding='same', activation=tf.nn.relu)
    pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
    conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)
    pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

    # Change From the tutorial - 9*9*64 instead of 7*7*64 to match different image size
    pool2_flat = tf.reshape(pool2, [-1, 9 * 9 * 64])

    dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
    dropout = tf.layers.dropout(inputs=dense, rate=0.4, training=mode == learn.ModeKeys.TRAIN)

    # Change from tutorial = 89 classes instead of 10
    logits = tf.layers.dense(inputs=dropout, units=89)
    onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=89)

    loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
    train_op = tf.contrib.layers.optimize_loss(
            loss=loss,
            global_step=tf.contrib.framework.get_global_step(),
            learning_rate=0.001,
            optimizer='SGD'
    )
    # Generate Predictions
    predictions = {
        "classes": tf.argmax(input=logits, axis=1),
        "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
    }

    return model_fn.ModelFnOps(mode=mode, predictions=predictions, loss=loss, train_op=train_op)

I thought that the network parameters are maybe to small to handle 36*36 pictures and 89 classes, so I tried giving the conv layers more features, or increase the dense layer nodes, but nothing seems to change. The only results in tensorboard I see are of 1%-1.5% accuracy, which fit well with random guesses on 89 classes.

I'm a real newbie in tensorflow and machine learning so please be gentle with me, and tell me please what i'm doing wrong.

Edit: Training code:

def train(batch_size, steps):
    config = get_config() 
    config.mode = 'train' # just for the example
    # Generate training data
    if config.mode == learn.ModeKeys.TRAIN:
        x, y = generate(batch_size * steps)
    print(y) # prints correct labels in range 0-89
    logging_hook = tf.train.LoggingTensorHook(tensors={"Probabilities": "softmax_tensor"}, every_n_iter=50)
    char_classifier = learn.Estimator(model_fn=model, model_dir=mydir)

    # Train!
    if config.mode == learn.ModeKeys.TRAIN: #assume this is always true in the example here
        char_classifier.fit(x=x, y=y, batch_size=batch_size, steps=steps, monitors=[logging_hook])
    accuracy = learn.MetricSpec(metric_fn=tf.metrics.accuracy, prediction_key="classes")

    # Evaluate!
    x, y = generate(1000)
    eval_results = char_classifier.evaluate(x=x, y=y, metrics={'accuracy':accuracy})

Also, sample images:

the first thing I'd check if you get random guess results is that you're setting the labels correctly. I once had a bug while one-hot encoding effectively shuffled the labels giving the same problematic result. — Thomas Jungblut, Jul 04 '17 at 06:12
One suspicion I have is due to `dropout`. Did you change the mode when doing inference? — jkschin, Jul 04 '17 at 06:19
@ThomasJungblut when I analyze the inputs and labels they seem to be ok, but i'll look into that further, tnx for the comment — Ofer Sadan, Jul 04 '17 at 06:34
I just did, a small batch of 10 resulted in `array([77, 26, 72, 39, 62, 67, 60, 48, 25, 60], dtype=int32)`, a larger batch resulted in 88 for `max(y)` and 0 for `min(y)` — Ofer Sadan, Jul 04 '17 at 06:53
Also I just checked again and all the labels I could manually check fit their respective picture 100% of the time — Ofer Sadan, Jul 04 '17 at 07:05
@nttstar I did not... now that you mention it i notice that the mnist inputs are all in range 0-1 and mine are in range 0-255, could that make a difference? If so, how to normalize? — Ofer Sadan, Jul 04 '17 at 07:47
you need to preprocess the input pixels to [-1,1]. Simply (image/127.5-1.0). Otherwise the network will hard to converge. — nttstar, Jul 04 '17 at 07:51
I tried normalizing the input, both with `image/127.5-1` like you suggested to get values in range [-1,1], and also with `image/255` to get ranges [0,1], both of these models still fail to curve up in accuracy results, staying at 1.12% on average, which is random guessing — Ofer Sadan, Jul 04 '17 at 08:29
See [this answer](https://stackoverflow.com/a/39686384/1714410) for example. — Shai, Jul 05 '17 at 05:09
tnx all, I have succeeded by removing noise from the images, I might need a different network or pre-processing to learn the images with noise — Ofer Sadan, Jul 05 '17 at 05:30

Why is my CNN model not learning anything? (tensorflow)

0 Answers0