If you look closely at run_inference_on_image
and the whole classify_image.py
script, it doesn't define a model. It is just a runner script that loads a pre-trained model from disk (see create_graph
) and executes it according to certain conventions (run_inference_on_image
loos for the tensor named softmax:0
).
The tutorial states the same:
classify_image.py
downloads the trained model from tensorflow.org when the program is run for the first time.
So the exact answer to your question in fact depends on what model you actually decide to run (e.g., you can supply your own model). I'll focus on the default choice of this script, namely Inception model (see DATA_URL
constant). By the way, there a newer pre-trained Inception v3 model that you can use as well (GitHub issue).
Side note: The exact source code of this implementation is not published, but we can take a look at the latest implementation of the same network in tf slim. The naming within a graph is a bit different, but the model is practically the same.
The whole model in one picture looks something like this. Essentially it's a long sequence of inception modules, consisting of convolutional layers with various filters. The variant of inception module v3 is:

Here each a x b
box means a convolutional layer with filter size [a, b]
. It looks intimidating, but if you follow the history of its development over the years, it starts to make sense.
The picture above translates into the following code (for n=7
):
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(128), [1, 7],
scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(128), [7, 1],
scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, depth(128), [1, 7],
scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, depth(128), [7, 1],
scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
As for your suggestion of "linear algebra operations", note that the convolutional layer is different from linear layer (see CS231n tutorial for details), though there exist efficient GPU implementations that boil down to matrix multiplications.
As you can see, repeating the same model from scratch using only low-level operations would require a lot of code (the full source code in tf slim is 600 lines, and it actually consists of high-level abstractions). If you want to retrain it yourself from the pre-trained state, it would be simpler to import already built model like this:
from tensorflow.contrib.slim.python.slim.nets.inception_v3 import inception_v3
...
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception_v3(inputs, num_classes)