3

I'm training the image classification model as per : https://www.tensorflow.org/tutorials/image_recognition

I aim to extract the learned weight values ( based on : Extracting weights values from a tensorflow model checkpoint ) and execute the model using linear algebra operations only.

The function def run_inference_on_image(image) (src https://github.com/tensorflow/models/blob/master/tutorials/image/imagenet/classify_image.py) classifies an image but the linear algebra operations used to classify the image do not appear to be viewable. Is possible to execute the model using the various matrix transformations that I assume are taking place ' under the hood' in function run_inference_on_image ?

blue-sky
  • 51,962
  • 152
  • 427
  • 752

1 Answers1

2

If you look closely at run_inference_on_image and the whole classify_image.py script, it doesn't define a model. It is just a runner script that loads a pre-trained model from disk (see create_graph) and executes it according to certain conventions (run_inference_on_image loos for the tensor named softmax:0).

The tutorial states the same:

classify_image.py downloads the trained model from tensorflow.org when the program is run for the first time.

So the exact answer to your question in fact depends on what model you actually decide to run (e.g., you can supply your own model). I'll focus on the default choice of this script, namely Inception model (see DATA_URL constant). By the way, there a newer pre-trained Inception v3 model that you can use as well (GitHub issue).

Side note: The exact source code of this implementation is not published, but we can take a look at the latest implementation of the same network in tf slim. The naming within a graph is a bit different, but the model is practically the same.


The whole model in one picture looks something like this. Essentially it's a long sequence of inception modules, consisting of convolutional layers with various filters. The variant of inception module v3 is:

inception-module-v3

Here each a x b box means a convolutional layer with filter size [a, b]. It looks intimidating, but if you follow the history of its development over the years, it starts to make sense.

The picture above translates into the following code (for n=7):

  with tf.variable_scope(end_point):
    with tf.variable_scope('Branch_0'):
      branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
    with tf.variable_scope('Branch_1'):
      branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
      branch_1 = slim.conv2d(branch_1, depth(128), [1, 7],
                             scope='Conv2d_0b_1x7')
      branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
                             scope='Conv2d_0c_7x1')
    with tf.variable_scope('Branch_2'):
      branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
      branch_2 = slim.conv2d(branch_2, depth(128), [7, 1],
                             scope='Conv2d_0b_7x1')
      branch_2 = slim.conv2d(branch_2, depth(128), [1, 7],
                             scope='Conv2d_0c_1x7')
      branch_2 = slim.conv2d(branch_2, depth(128), [7, 1],
                             scope='Conv2d_0d_7x1')
      branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
                             scope='Conv2d_0e_1x7')
    with tf.variable_scope('Branch_3'):
      branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
      branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
                             scope='Conv2d_0b_1x1')
    net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])

As for your suggestion of "linear algebra operations", note that the convolutional layer is different from linear layer (see CS231n tutorial for details), though there exist efficient GPU implementations that boil down to matrix multiplications.

As you can see, repeating the same model from scratch using only low-level operations would require a lot of code (the full source code in tf slim is 600 lines, and it actually consists of high-level abstractions). If you want to retrain it yourself from the pre-trained state, it would be simpler to import already built model like this:

from tensorflow.contrib.slim.python.slim.nets.inception_v3 import inception_v3
...
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception_v3(inputs, num_classes)
Maxim
  • 52,561
  • 27
  • 155
  • 209
  • "The exact source code of this implementation is not published" Are you referring to "Inception v3 model" ? is this not the src code for Inception ? https://github.com/tensorflow/models/tree/master/research/inception is it a high level abstraction ? – blue-sky Mar 02 '18 at 12:04
  • I inspected the tensor names from proto-buf files from that tar-ball and they appear to be different from that definition. Tensorflow code is updated too frequently to hope that it matches the 2-year old weights. As for high-level code: the whole TF Slim library is a wrapper over main TF that operate in terms of layers, not tensors. The actual ops are even deeper. – Maxim Mar 02 '18 at 12:15
  • thanks, so if I want to execute an image classification model on a device that does not support python (such as some VR headsets) options are write my own model that can then be executed on any device that supports linear algebra or use an already built model such as Inception and expose it's classification function via a service ? – blue-sky Mar 02 '18 at 13:48
  • 1
    Tensorflow can be executed on various platforms, because underlying code is all C++ (example: https://stackoverflow.com/q/47178371/712995). On a mobile device, most models can be run with [tensorflow lite](https://www.tensorflow.org/mobile/tflite/), Inception v3 in particular. There are bindings in Android. So I think you should be fine, convolution is a basic op, just like matmul, it must be supported by all devices. – Maxim Mar 02 '18 at 14:19