How to classify a QuickDraw doodle using TensorFlow's sketch RNN tutorial?

Question

Clarifications:

This question is regarding this QuickDraw RNN Drawing classification tensorflow tutorial, not the text RNN tensorflow tutorial
To an extend this is a duplicate of Farooq Khan's question, however I could use a few more specific details (that otherwise easily become cumbersome comments) and take the opportunity to reward Farooq for taking his time to provide further details.

I'm running tensorflow 1.6.0-rc0 compiled from source with GPU support on a Macbook with an NVIDIA GeForce GT 750M 2048 MB GPU.

I've attempted to train like so:

python train_model.py --model_dir=./model_gpu --training_data=./rnn_tutorial_data/training.tfrecord-00000-of-00010 --eval_data=./rnn_tutorial_data/eval.tfrecord-00000-of-00010 --classes_file=./rnn_tutorial_data/training.tfrecord.classes --cell_type=cudnn_lstm

The initial clarifications I'm looking for are:

should I use the command above, then once it completes run :python train_model.py --model_dir=./model_gpu --training_data=./rnn_tutorial_data/training.tfrecord-00001-of-00010 --eval_data=./rnn_tutorial_data/eval.tfrecord-00001-of-00010 --classes_file=./rnn_tutorial_data/training.tfrecord.classes --cell_type=cudnn_lstm through to python train_model.py --model_dir=./model_gpu --training_data=./rnn_tutorial_data/training.tfrecord-00009-of-00010 --eval_data=./rnn_tutorial_data/eval.tfrecord-00009-of-00010 --classes_file=./rnn_tutorial_data/training.tfrecord.classes --cell_type=cudnn_lstm or should I run the command mentioned in the tutorial as-is: python train_model.py \ --training_data=rnn_tutorial_data/training.tfrecord-?????-of-????? \ --eval_data=rnn_tutorial_data/eval.tfrecord-?????-of-????? \ --classes_file=rnn_tutorial_data/training.tfrecord.classes
- how do I know when training is complete ? (these are the last messages from the last training session: 2018-04-11 01:43:27.180805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1410] Adding visible gpu devices: 0 2018-04-11 01:43:27.180860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-11 01:43:27.180866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-11 01:43:27.180869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-11 01:43:27.180950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 100 MB memory) -> physical GPU (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0, compute capability: 3.0) No error or any other output afterwards: hard to distinguish those messages from other previous checkpoints)
- how to pass a custom doodle for classification ? This is the core of my question. Farooq's create_tfrecord_for_prediction in his answer is great: a full script to run/test would be amazing

Update2

Thanks to Farooq's helpful notes, here is a tweaked version of the code that prints a prediction to console:

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

r"""Binary for trianing a RNN-based classifier for the Quick, Draw! data.

python train_model.py \
  --training_data train_data \
  --eval_data eval_data \
  --model_dir /tmp/quickdraw_model/ \
  --cell_type cudnn_lstm

When running on GPUs using --cell_type cudnn_lstm is much faster.

The expected performance is ~75% in 1.5M steps with the default configuration.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import functools
import sys

from datetime import datetime
import json
import numpy as np


import tensorflow as tf


def get_num_classes():
  classes = []
  with tf.gfile.GFile(FLAGS.classes_file, "r") as f:
    classes = [x for x in f]
  num_classes = len(classes)
  return num_classes


def get_input_fn(mode, tfrecord_pattern, batch_size):
  """Creates an input_fn that stores all the data in memory.

  Args:
   mode: one of tf.contrib.learn.ModeKeys.{TRAIN, INFER, EVAL}
   tfrecord_pattern: path to a TF record file created using create_dataset.py.
   batch_size: the batch size to output.

  Returns:
    A valid input_fn for the model estimator.
  """

  def _parse_tfexample_fn(example_proto, mode):
    """Parse a single record which is expected to be a tensorflow.Example."""
    feature_to_type = {
        "ink": tf.VarLenFeature(dtype=tf.float32),
        "shape": tf.FixedLenFeature([2], dtype=tf.int64)
    }
    if mode != tf.estimator.ModeKeys.PREDICT:
      # The labels won't be available at inference time, so don't add them
      # to the list of feature_columns to be read.
      feature_to_type["class_index"] = tf.FixedLenFeature([1], dtype=tf.int64)

    parsed_features = tf.parse_single_example(example_proto, feature_to_type)
    parsed_features["ink"] = tf.sparse_tensor_to_dense(parsed_features["ink"])

    if mode != tf.estimator.ModeKeys.PREDICT:
      labels = parsed_features["class_index"]
      return parsed_features, labels
    else:
      return parsed_features  # In prediction, we have no labels

  def _input_fn():
    """Estimator `input_fn`.

    Returns:
      A tuple of:
      - Dictionary of string feature name to `Tensor`.
      - `Tensor` of target labels.
    """
    dataset = tf.data.TFRecordDataset.list_files(tfrecord_pattern)
    if mode == tf.estimator.ModeKeys.TRAIN:
      dataset = dataset.shuffle(buffer_size=10)
    dataset = dataset.repeat()
    # Preprocesses 10 files concurrently and interleaves records from each file.
    dataset = dataset.interleave(
        tf.data.TFRecordDataset,
        cycle_length=10,
        block_length=1)
    dataset = dataset.map(
        functools.partial(_parse_tfexample_fn, mode=mode),
        num_parallel_calls=10)
    dataset = dataset.prefetch(10000)
    if mode == tf.estimator.ModeKeys.TRAIN:
      dataset = dataset.shuffle(buffer_size=1000000)
    # Our inputs are variable length, so pad them.
    dataset = dataset.padded_batch(
        batch_size, padded_shapes=dataset.output_shapes)

    iter = dataset.make_one_shot_iterator()
    if mode != tf.estimator.ModeKeys.PREDICT:
        features, labels = iter.get_next()
        return features, labels
    else:
        features = iter.get_next()
        return features, None  # In prediction, we have no labels

  return _input_fn


def model_fn(features, labels, mode, params):
  """Model function for RNN classifier.

  This function sets up a neural network which applies convolutional layers (as
  configured with params.num_conv and params.conv_len) to the input.
  The output of the convolutional layers is given to LSTM layers (as configured
  with params.num_layers and params.num_nodes).
  The final state of the all LSTM layers are concatenated and fed to a fully
  connected layer to obtain the final classification scores.

  Args:
    features: dictionary with keys: inks, lengths.
    labels: one hot encoded classes
    mode: one of tf.estimator.ModeKeys.{TRAIN, INFER, EVAL}
    params: a parameter dictionary with the following keys: num_layers,
      num_nodes, batch_size, num_conv, conv_len, num_classes, learning_rate.

  Returns:
    ModelFnOps for Estimator API.
  """

  def _get_input_tensors(features, labels):
    """Converts the input dict into inks, lengths, and labels tensors."""
    # features[ink] is a sparse tensor that is [8, batch_maxlen, 3]
    # inks will be a dense tensor of [8, maxlen, 3]
    # shapes is [batchsize, 2]
    shapes = features["shape"]
    # lengths will be [batch_size]
    lengths = tf.squeeze(
        tf.slice(shapes, begin=[0, 0], size=[params.batch_size, 1]))
    inks = tf.reshape(features["ink"], [params.batch_size, -1, 3])
    if labels is not None:
      labels = tf.squeeze(labels)
    return inks, lengths, labels

  def _add_conv_layers(inks, lengths):
    """Adds convolution layers."""
    convolved = inks
    for i in range(len(params.num_conv)):
      convolved_input = convolved
      if params.batch_norm:
        convolved_input = tf.layers.batch_normalization(
            convolved_input,
            training=(mode == tf.estimator.ModeKeys.TRAIN))
      # Add dropout layer if enabled and not first convolution layer.
      if i > 0 and params.dropout:
        convolved_input = tf.layers.dropout(
            convolved_input,
            rate=params.dropout,
            training=(mode == tf.estimator.ModeKeys.TRAIN))
      convolved = tf.layers.conv1d(
          convolved_input,
          filters=params.num_conv[i],
          kernel_size=params.conv_len[i],
          activation=None,
          strides=1,
          padding="same",
          name="conv1d_%d" % i)
    return convolved, lengths

  def _add_regular_rnn_layers(convolved, lengths):
    """Adds RNN layers."""
    if params.cell_type == "lstm":
      cell = tf.nn.rnn_cell.BasicLSTMCell
    elif params.cell_type == "block_lstm":
      cell = tf.contrib.rnn.LSTMBlockCell
    cells_fw = [cell(params.num_nodes) for _ in range(params.num_layers)]
    cells_bw = [cell(params.num_nodes) for _ in range(params.num_layers)]
    if params.dropout > 0.0:
      cells_fw = [tf.contrib.rnn.DropoutWrapper(cell) for cell in cells_fw]
      cells_bw = [tf.contrib.rnn.DropoutWrapper(cell) for cell in cells_bw]
    outputs, _, _ = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
        cells_fw=cells_fw,
        cells_bw=cells_bw,
        inputs=convolved,
        sequence_length=lengths,
        dtype=tf.float32,
        scope="rnn_classification")
    return outputs

  def _add_cudnn_rnn_layers(convolved):
    """Adds CUDNN LSTM layers."""
    # Convolutions output [B, L, Ch], while CudnnLSTM is time-major.
    convolved = tf.transpose(convolved, [1, 0, 2])
    lstm = tf.contrib.cudnn_rnn.CudnnLSTM(
        num_layers=params.num_layers,
        num_units=params.num_nodes,
        dropout=params.dropout if mode == tf.estimator.ModeKeys.TRAIN else 0.0,
        direction="bidirectional")
    outputs, _ = lstm(convolved)
    # Convert back from time-major outputs to batch-major outputs.
    outputs = tf.transpose(outputs, [1, 0, 2])
    return outputs

  def _add_rnn_layers(convolved, lengths):
    """Adds recurrent neural network layers depending on the cell type."""
    if params.cell_type != "cudnn_lstm":
      outputs = _add_regular_rnn_layers(convolved, lengths)
    else:
      outputs = _add_cudnn_rnn_layers(convolved)
    # outputs is [batch_size, L, N] where L is the maximal sequence length and N
    # the number of nodes in the last layer.
    mask = tf.tile(
        tf.expand_dims(tf.sequence_mask(lengths, tf.shape(outputs)[1]), 2),
        [1, 1, tf.shape(outputs)[2]])
    zero_outside = tf.where(mask, outputs, tf.zeros_like(outputs))
    outputs = tf.reduce_sum(zero_outside, axis=1)
    return outputs

  def _add_fc_layers(final_state):
    """Adds a fully connected layer."""
    return tf.layers.dense(final_state, params.num_classes)

  # Build the model.
  inks, lengths, labels = _get_input_tensors(features, labels)
  convolved, lengths = _add_conv_layers(inks, lengths)
  final_state = _add_rnn_layers(convolved, lengths)
  logits = _add_fc_layers(final_state)

  # Compute current predictions.
  predictions = tf.argmax(logits, axis=1)

  if mode == tf.estimator.ModeKeys.PREDICT:
      preds = {
          "class_index": predictions,
          #"class_index": predictions[:, tf.newaxis],
          "probabilities": tf.nn.softmax(logits),
          "logits": logits
      }
      #preds = {"logits": logits, "predictions": predictions}

      return tf.estimator.EstimatorSpec(mode, predictions=preds)
      # Add the loss.
  cross_entropy = tf.reduce_mean(
      tf.nn.sparse_softmax_cross_entropy_with_logits(
          labels=labels, logits=logits))

  # Add the optimizer.
  train_op = tf.contrib.layers.optimize_loss(
      loss=cross_entropy,
      global_step=tf.train.get_global_step(),
      learning_rate=params.learning_rate,
      optimizer="Adam",
      # some gradient clipping stabilizes training in the beginning.
      clip_gradients=params.gradient_clipping_norm,
      summaries=["learning_rate", "loss", "gradients", "gradient_norm"])

  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions={"logits": logits, "predictions": predictions},
      loss=cross_entropy,
      train_op=train_op,
      eval_metric_ops={"accuracy": tf.metrics.accuracy(labels, predictions)})


def create_estimator_and_specs(run_config):
  """Creates an Experiment configuration based on the estimator and input fn."""
  model_params = tf.contrib.training.HParams(
      num_layers=FLAGS.num_layers,
      num_nodes=FLAGS.num_nodes,
      batch_size=FLAGS.batch_size,
      num_conv=ast.literal_eval(FLAGS.num_conv),
      conv_len=ast.literal_eval(FLAGS.conv_len),
      num_classes=get_num_classes(),
      learning_rate=FLAGS.learning_rate,
      gradient_clipping_norm=FLAGS.gradient_clipping_norm,
      cell_type=FLAGS.cell_type,
      batch_norm=FLAGS.batch_norm,
      dropout=FLAGS.dropout)

  estimator = tf.estimator.Estimator(
      model_fn=model_fn,
      config=run_config,
      params=model_params)

  train_spec = tf.estimator.TrainSpec(input_fn=get_input_fn(
      mode=tf.estimator.ModeKeys.TRAIN,
      tfrecord_pattern=FLAGS.training_data,
      batch_size=FLAGS.batch_size), max_steps=FLAGS.steps)

  eval_spec = tf.estimator.EvalSpec(input_fn=get_input_fn(
      mode=tf.estimator.ModeKeys.EVAL,
      tfrecord_pattern=FLAGS.eval_data,
      batch_size=FLAGS.batch_size))

  return estimator, train_spec, eval_spec


# def main(unused_args):
#   estimator, train_spec, eval_spec = create_estimator_and_specs(
#       run_config=tf.estimator.RunConfig(
#           model_dir=FLAGS.model_dir,
#           save_checkpoints_secs=300,
#           save_summary_steps=100))
#   tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
def create_tfrecord_for_prediction(batch_size, stoke_data, tfrecord_file):
    def parse_line(stoke_data):
        """Parse provided stroke data and ink (as np array) and classname."""
        inkarray = json.loads(stoke_data)
        stroke_lengths = [len(stroke[0]) for stroke in inkarray]
        total_points = sum(stroke_lengths)
        np_ink = np.zeros((total_points, 3), dtype=np.float32)
        current_t = 0
        for stroke in inkarray:
            if len(stroke[0]) != len(stroke[1]):
                print("Inconsistent number of x and y coordinates.")
                return None
            for i in [0, 1]:
                np_ink[current_t:(current_t + len(stroke[0])), i] = stroke[i]
            current_t += len(stroke[0])
            np_ink[current_t - 1, 2] = 1  # stroke_end
        # Preprocessing.
        # 1. Size normalization.
        lower = np.min(np_ink[:, 0:2], axis=0)
        upper = np.max(np_ink[:, 0:2], axis=0)
        scale = upper - lower
        scale[scale == 0] = 1
        np_ink[:, 0:2] = (np_ink[:, 0:2] - lower) / scale
        # 2. Compute deltas.
        #np_ink = np_ink[1:, 0:2] - np_ink[0:-1, 0:2]
        #np_ink = np_ink[1:, :]
        np_ink[1:, 0:2] -= np_ink[0:-1, 0:2]
        np_ink = np_ink[1:, :]

        features = {}
        features["ink"] = tf.train.Feature(float_list=tf.train.FloatList(value=np_ink.flatten()))
        features["shape"] = tf.train.Feature(int64_list=tf.train.Int64List(value=np_ink.shape))
        f = tf.train.Features(feature=features)
        ex = tf.train.Example(features=f)
        return ex

    if stoke_data is None:
        print("Error: Stroke data cannot be none")
        return

    example = parse_line(stoke_data)

    #Remove the file if it already exists
    if tf.gfile.Exists(tfrecord_file):
        tf.gfile.Remove(tfrecord_file)

    writer = tf.python_io.TFRecordWriter(tfrecord_file)
    for i in range(batch_size):
        writer.write(example.SerializeToString())
    writer.flush()
    writer.close()
    print ('wrote',tfrecord_file)

def get_classes():
  classes = []
  with tf.gfile.GFile(FLAGS.classes_file, "r") as f:
    classes = [x.rstrip() for x in f]
  return classes

def main(unused_args):
  print("%s: I Starting application" % (datetime.now()))
  print("FLAGS",FLAGS)
  estimator, train_spec, eval_spec = create_estimator_and_specs(
      run_config=tf.estimator.RunConfig(
          model_dir=FLAGS.model_dir,
          save_checkpoints_secs=300,
          save_summary_steps=100))
  print("estimator",estimator,"train_spec",train_spec,"eval_spec",eval_spec) 
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

  if FLAGS.predict_for_data != None:
      print("%s: I Starting prediction" % (datetime.now()))
      class_names = get_classes()
      create_tfrecord_for_prediction(FLAGS.batch_size, FLAGS.predict_for_data, FLAGS.predict_temp_file)
      predict_results = estimator.predict(input_fn=get_input_fn(
          mode=tf.estimator.ModeKeys.PREDICT,
          tfrecord_pattern=FLAGS.predict_temp_file,
          batch_size=FLAGS.batch_size))

      #predict_results = estimator.predict(input_fn=predict_input_fn)
      for idx, prediction in enumerate(predict_results):
          index = prediction["class_index"]  # Get the predicted class (index)
          probability = prediction["probabilities"][index]
          class_name = class_names[index]
          print("%s: Predicted Class is: %s with a probability of %f" % (datetime.now(), class_name, probability))
          break #We care for only the first prediction, rest are all duplicates just to meet the batch size


if __name__ == "__main__":
  parser = argparse.ArgumentParser()
  parser.register("type", "bool", lambda v: v.lower() == "true")
  parser.add_argument(
      "--training_data",
      type=str,
      default="",
      help="Path to training data (tf.Example in TFRecord format)")
  parser.add_argument(
      "--eval_data",
      type=str,
      default="",
      help="Path to evaluation data (tf.Example in TFRecord format)")
  parser.add_argument(
      "--classes_file",
      type=str,
      default="",
      help="Path to a file with the classes - one class per line")
  parser.add_argument(
      "--num_layers",
      type=int,
      default=3,
      help="Number of recurrent neural network layers.")
  parser.add_argument(
      "--num_nodes",
      type=int,
      default=128,
      help="Number of node per recurrent network layer.")
  parser.add_argument(
      "--num_conv",
      type=str,
      default="[48, 64, 96]",
      help="Number of conv layers along with number of filters per layer.")
  parser.add_argument(
      "--conv_len",
      type=str,
      default="[5, 5, 3]",
      help="Length of the convolution filters.")
  parser.add_argument(
      "--cell_type",
      type=str,
      default="lstm",
      help="Cell type used for rnn layers: cudnn_lstm, lstm or block_lstm.")
  parser.add_argument(
      "--batch_norm",
      type="bool",
      default="False",
      help="Whether to enable batch normalization or not.")
  parser.add_argument(
      "--learning_rate",
      type=float,
      default=0.0001,
      help="Learning rate used for training.")
  parser.add_argument(
      "--gradient_clipping_norm",
      type=float,
      default=9.0,
      help="Gradient clipping norm used during training.")
  parser.add_argument(
      "--dropout",
      type=float,
      default=0.3,
      help="Dropout used for convolutions and bidi lstm layers.")
  parser.add_argument(
      "--steps",
      type=int,
      default=100000,
      help="Number of training steps.")
  parser.add_argument(
      "--batch_size",
      type=int,
      default=8,
      help="Batch size to use for training/evaluation.")
  parser.add_argument(
      "--model_dir",
      type=str,
      default="",
      help="Path for storing the model checkpoints.")
  parser.add_argument(
      "--self_test",
      type=bool,
      default="False",
      help="Whether to enable batch normalization or not.")
  parser.add_argument(
      "--predict_for_data",
      type=str,
      default="[[[73,66,46,23,12,11,22,48,58,67,70,65],[11,6,2,10,23,33,48,56,54,41,22,10]],[[66,85,71],[9,3,26]],[[24,1,2,8],[6,1,10,19]],[[64,88,134,176,180,184,184,174,111,63,47],[34,29,28,35,39,58,91,94,86,71,62]],[[64,61,62],[74,83,102]],[[83,84,87],[78,102,107]],[[157,159,164],[96,108,116]],[[175,182],[91,115]],[[182,186,198,209,223,234,251,255],[51,36,29,30,38,39,20,8]],[[157,136,128,133,139],[35,47,57,35,29]],[[104,94,84,84,89],[40,52,70,30,26]],[[111,105,105,109,121],[30,59,68,72,34]],[[159,153,153],[41,54,65]]]",
      help=".ndjson single line .drawing (e.g. just the strokes, no labels)")
  parser.add_argument(
      "--predict_temp_file",
      type=str,
      default="./predict_temp.tfrecord",
      help="path to a temporary tfrecord that will be created from the .ndjson drawing data")

  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

I've ran the above like so:

python classify.py --classes_file=rnn_tutorial_data/training.tfrecord.classes --model_dir=model_gpu_all/ --training_data=./rnn_tutorial_data/training.tfrecord-?????-of-????? --eval_data=./rnn_tutorial_data/eval.tfrecord-?????-of-????? --predict_for_data="[[[73,66,46,23,12,11,22,48,58,67,70,65],[11,6,2,10,23,33,48,56,54,41,22,10]],[[66,85,71],[9,3,26]],[[24,1,2,8],[6,1,10,19]],[[64,88,134,176,180,184,184,174,111,63,47],[34,29,28,35,39,58,91,94,86,71,62]],[[64,61,62],[74,83,102]],[[83,84,87],[78,102,107]],[[157,159,164],[96,108,116]],[[175,182],[91,115]],[[182,186,198,209,223,234,251,255],[51,36,29,30,38,39,20,8]],[[157,136,128,133,139],[35,47,57,35,29]],[[104,94,84,84,89],[40,52,70,30,26]],[[111,105,105,109,121],[30,59,68,72,34]],[[159,153,153],[41,54,65]]]" --predict_temp_file=./predict_temp.tfrecord --cell_type=cudnn_lstm

and finally got a prediction:

Predicted Class is: cow with a probability of 0.533384

Not a great one (tutorial warns on dataset size and accuracy), but it's a prediction, yaaay! Full execution took 31 seconds with this example.

Farooq Khan · Accepted Answer · 2018-04-21T11:27:40.953

python train_model.py \ --training_data=rnn_tutorial_data/training.tfrecord-?????-of-????? \ --eval_data=rnn_tutorial_data/eval.tfrecord-?????-of-????? \ --classes_file=rnn_tutorial_data/training.tfrecord.classes

AFAIK using the above command works as well, it will simply read all the files in the folder where you have download the data files one by one.

create_tfrecord_for_prediction is certainly not my own creation, this code was mostly picked from another file from tensorflow guys create_dataset.py

Below I have pasted the almost all of the new code i added including the my modifications to the main() function

def create_tfrecord_for_prediction(batch_size, stoke_data, tfrecord_file):
    def parse_line(stoke_data):
        """Parse provided stroke data and ink (as np array) and classname."""
        inkarray = json.loads(stoke_data)
        stroke_lengths = [len(stroke[0]) for stroke in inkarray]
        total_points = sum(stroke_lengths)
        np_ink = np.zeros((total_points, 3), dtype=np.float32)
        current_t = 0
        for stroke in inkarray:
            if len(stroke[0]) != len(stroke[1]):
                print("Inconsistent number of x and y coordinates.")
                return None
            for i in [0, 1]:
                np_ink[current_t:(current_t + len(stroke[0])), i] = stroke[i]
            current_t += len(stroke[0])
            np_ink[current_t - 1, 2] = 1  # stroke_end
        # Preprocessing.
        # 1. Size normalization.
        lower = np.min(np_ink[:, 0:2], axis=0)
        upper = np.max(np_ink[:, 0:2], axis=0)
        scale = upper - lower
        scale[scale == 0] = 1
        np_ink[:, 0:2] = (np_ink[:, 0:2] - lower) / scale
        # 2. Compute deltas.
        #np_ink = np_ink[1:, 0:2] - np_ink[0:-1, 0:2]
        #np_ink = np_ink[1:, :]
        np_ink[1:, 0:2] -= np_ink[0:-1, 0:2]
        np_ink = np_ink[1:, :]

        features = {}
        features["ink"] = tf.train.Feature(float_list=tf.train.FloatList(value=np_ink.flatten()))
        features["shape"] = tf.train.Feature(int64_list=tf.train.Int64List(value=np_ink.shape))
        f = tf.train.Features(feature=features)
        ex = tf.train.Example(features=f)
        return ex

    if stoke_data is None:
        print("Error: Stroke data cannot be none")
        return

    example = parse_line(stoke_data)

    #Remove the file if it already exists
    if tf.gfile.Exists(tfrecord_file):
        tf.gfile.Remove(tfrecord_file)

    writer = tf.python_io.TFRecordWriter(tfrecord_file)
    for i in range(batch_size):
        writer.write(example.SerializeToString())
    writer.flush()
    writer.close()

def get_classes():
  classes = []
  with tf.gfile.GFile(FLAGS.classes_file, "r") as f:
    classes = [x.rstrip() for x in f]
  return classes

def main(unused_args):
  print("%s: I Starting application" % (datetime.now()))

  estimator, train_spec, eval_spec = create_estimator_and_specs(
      run_config=tf.estimator.RunConfig(
          model_dir=FLAGS.model_dir,
          save_checkpoints_secs=300,
          save_summary_steps=100))
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

  if FLAGS.predict_for_data != None:
      print("%s: I Starting prediction" % (datetime.now()))
      class_names = get_classes()
      create_tfrecord_for_prediction(FLAGS.batch_size, FLAGS.predict_for_data, FLAGS.predict_temp_file)
      predict_results = estimator.predict(input_fn=get_input_fn(
          mode=tf.estimator.ModeKeys.PREDICT,
          tfrecord_pattern=FLAGS.predict_temp_file,
          batch_size=FLAGS.batch_size))

      #predict_results = estimator.predict(input_fn=predict_input_fn)
      for idx, prediction in enumerate(predict_results):
          index = prediction["class_index"]  # Get the predicted class (index)
          probability = prediction["probabilities"][index]
          class_name = class_names[index]
          print("%s: Predicted Class is: %s with a probability of %f" % (datetime.now(), class_name, probability))
          break #We care for only the first prediction, rest are all duplicates just to meet the batch size

FLAGS.predict_for_data this is the command line argument that holds the strokes data FLAGS.predict_temp_file is just a name of file i use to create the temporary input data tfrecord file

Note1 : Along with this I also modified some code in get_input_fn() you can find this code change in this PR: https://github.com/tensorflow/models/pull/3440 (has not been merged yet)

Note2: I also had to modify model_fn() and add the below few lines my additions are after the comment #Compute current predictions

  # Build the model.
  inks, lengths, labels = _get_input_tensors(features, labels)
  convolved, lengths = _add_conv_layers(inks, lengths)
  final_state = _add_rnn_layers(convolved, lengths)
  logits = _add_fc_layers(final_state)

  # Compute current predictions.
  predictions = tf.argmax(logits, axis=1)

  if mode == tf.estimator.ModeKeys.PREDICT:
      preds = {
          "class_index": predictions,
          #"class_index": predictions[:, tf.newaxis],
          "probabilities": tf.nn.softmax(logits),
          "logits": logits
      }
      #preds = {"logits": logits, "predictions": predictions}

      return tf.estimator.EstimatorSpec(mode, predictions=preds)
      # Add the loss.
  cross_entropy = tf.reduce_mean(
      tf.nn.sparse_softmax_cross_entropy_with_logits(
          labels=labels, logits=logits))

  # Add the optimizer.
  train_op = tf.contrib.layers.optimize_loss(
      loss=cross_entropy,
      global_step=tf.train.get_global_step(),
      learning_rate=params.learning_rate,
      optimizer="Adam",
      # some gradient clipping stabilizes training in the beginning.
      clip_gradients=params.gradient_clipping_norm,
      summaries=["learning_rate", "loss", "gradients", "gradient_norm"])

  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions={"logits": logits, "predictions": predictions},
      loss=cross_entropy,
      train_op=train_op,
      eval_metric_ops={"accuracy": tf.metrics.accuracy(labels, predictions)})

The only thing left is then to figure out generating strokes data. For this you can take one of the existing tfrecord file read it and extract a stroke from that read operation or you could write some javascript webpage to generate strokes

Hi Farooq, I've finally found the time to try and put your helpful notes together into a single file to run, but even with this wealth of details I still haven't managed to run a classification. I've updated my question above to include the full code and how I tried to run it. Would you be able to the full code you had running with the command line arguments passed ? I must be missing something and I'm having a hard time spotting it. Thank you — George Profenza, Apr 21 '18 at 11:03
As the error says you have to return a EstimatorSpec, donot comment out the rest of the code you need that too. Insert the code for prediction between the lines `logits = _add_fc_layers(final_state)` and line `# Add the loss` — Farooq Khan, Apr 21 '18 at 11:17
Thank you so much! Instant gratification! That worked! I've posted the full code in the question (I needed to import json,numpy and add the extra flags). The predictions don't look great, but the tutorial does warn: "When training the model for 1M steps you can expect to get an accuracy of approximately of approximately 70% on the top-1 candidate", so for 100K it would be less. At least now I can play with the training data size, steps etc.! I will be able to award the bounty in 22 hours (according the site) :) Thanks again! — George Profenza, Apr 21 '18 at 12:49
Did you manage to use the `create_dataset.py` script ? I'm running into some [issues](https://stackoverflow.com/questions/50020595/how-to-create-a-dataset-for-the-quickdraw-rnn-tensorflow-tutorial-resourceexhau) — George Profenza, Apr 27 '18 at 11:55
I did not use create_dataset.py so i did not hit this issue. I will try it out to see if I encounter the problem you are facing, but thats going to be only 3 days from now. — Farooq Khan, Apr 27 '18 at 16:03
Just wanted to update you: I've trained the model with a million steps instead of the default 100000 and accuracy looks better. Did you get a chance to try create_dataset.py ? — George Profenza, May 03 '18 at 11:00
Is the code outdated or something ? I use tensorflow's article to train my model upto 700k steps and have checkpoint files. I used same command to train the model but when using the above prediction code, it throws me a error ```InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:``` — vongolashu, May 31 '19 at 21:05
```No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at predict_tf.py:211) with these attrs: [T=DT_FLOAT, input_mode="linear_input", direction="bidirectional", rnn_mode="lstm", seed2=0, seed=0, dropout=0.3, num_params=48] Registered devices: [CPU, XLA_CPU] Registered kernels: [[node cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at predict_tf.py:211) ]]``` — vongolashu, May 31 '19 at 21:05

How to classify a QuickDraw doodle using TensorFlow's sketch RNN tutorial?

1 Answers1

Linked