2

I am testing the machine learning waters and used TS inception model to retrain the network to classify my desired objects.

Initially, my predictions were run on locally stored images and I realized that it took anywhere between 2-5 seconds to unpersist the graph from a file and around the same time to run the actual predictions.

Thereafter, I adapted my code to incorporate the camera feed from OpenCV but with the above noted times, video lags are inevitable.

A time hit was expected during initial graph load; which is why initialSetup() is ran beforehand, but 2-5 seconds is just absurd. I feel like with my current application; real-time classification, this is not the best way of loading it. Is there another way of doing this? I know with mobile versions TS recommends trimming down the graph. Would slimming it down be the way to go here? In case it matters my graph is currently 87.4MB

Along with this, is there a way of speeding up the prediction process?

import os
import cv2
import timeit
import numpy as np
import tensorflow as tf

camera = cv2.VideoCapture(0)

# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
               in tf.gfile.GFile('retrained_labels.txt')]

def grabVideoFeed():
    grabbed, frame = camera.read()
    return frame if grabbed else None

def initialSetup():
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
    start_time = timeit.default_timer()

    # This takes 2-5 seconds to run
    # Unpersists graph from file
    with tf.gfile.FastGFile('retrained_graph.pb', 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        tf.import_graph_def(graph_def, name='')

    print 'Took {} seconds to unpersist the graph'.format(timeit.default_timer() - start_time)

def classify(image_data):
    print '********* Session Start *********'

    with tf.Session() as sess:
        start_time = timeit.default_timer()

        # Feed the image_data as input to the graph and get first prediction
        softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')

        print 'Tensor', softmax_tensor

        print 'Took {} seconds to feed data to graph'.format(timeit.default_timer() - start_time)

        start_time = timeit.default_timer()

        # This takes 2-5 seconds as well
        predictions = sess.run(softmax_tensor, {'Mul:0': image_data})

        print 'Took {} seconds to perform prediction'.format(timeit.default_timer() - start_time)

        start_time = timeit.default_timer()

        # Sort to show labels of first prediction in order of confidence
        top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]

        print 'Took {} seconds to sort the predictions'.format(timeit.default_timer() - start_time)

        for node_id in top_k:
            human_string = label_lines[node_id]
            score = predictions[0][node_id]
            print('%s (score = %.5f)' % (human_string, score))

        print '********* Session Ended *********'

initialSetup()

while True:
    frame = grabVideoFeed()

    if frame is None:
        raise SystemError('Issue grabbing the frame')

    frame = cv2.resize(frame, (299, 299), interpolation=cv2.INTER_CUBIC)

    # adhere to TS graph input structure
    numpy_frame = np.asarray(frame)
    numpy_frame = cv2.normalize(numpy_frame.astype('float'), None, -0.5, .5, cv2.NORM_MINMAX)
    numpy_final = np.expand_dims(numpy_frame, axis=0)

    classify(numpy_final)

    cv2.imshow('Main', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

camera.release()
cv2.destroyAllWindows()

EDIT 1

After debugging my code, I realized that session creation is a both resource and time consuming operation.

In the prior code, a new session was created for each OpenCV feed on top of running the predictions. Wrapping the OpenCV operations inside a single session provides a massive time improvement but this still adds a massive overhead on the initial run; prediction takes 2-3 seconds. Afterwards, the prediction takes around 0.5s which makes the camera feed still laggy.

import os
import cv2
import timeit
import numpy as np
import tensorflow as tf

camera = cv2.VideoCapture(0)

# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
               in tf.gfile.GFile('retrained_labels.txt')]

def grabVideoFeed():
    grabbed, frame = camera.read()
    return frame if grabbed else None

def initialSetup():
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
    start_time = timeit.default_timer()

    # This takes 2-5 seconds to run
    # Unpersists graph from file
    with tf.gfile.FastGFile('retrained_graph.pb', 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        tf.import_graph_def(graph_def, name='')

    print 'Took {} seconds to unpersist the graph'.format(timeit.default_timer() - start_time)

initialSetup()

with tf.Session() as sess:
    start_time = timeit.default_timer()

    # Feed the image_data as input to the graph and get first prediction
    softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')

    print 'Took {} seconds to feed data to graph'.format(timeit.default_timer() - start_time)

    while True:
        frame = grabVideoFeed()

        if frame is None:
            raise SystemError('Issue grabbing the frame')

        frame = cv2.resize(frame, (299, 299), interpolation=cv2.INTER_CUBIC)

        cv2.imshow('Main', frame)

        # adhere to TS graph input structure
        numpy_frame = np.asarray(frame)
        numpy_frame = cv2.normalize(numpy_frame.astype('float'), None, -0.5, .5, cv2.NORM_MINMAX)
        numpy_final = np.expand_dims(numpy_frame, axis=0)

        start_time = timeit.default_timer()

        # This takes 2-5 seconds as well
        predictions = sess.run(softmax_tensor, {'Mul:0': numpy_final})

        print 'Took {} seconds to perform prediction'.format(timeit.default_timer() - start_time)

        start_time = timeit.default_timer()

        # Sort to show labels of first prediction in order of confidence
        top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]

        print 'Took {} seconds to sort the predictions'.format(timeit.default_timer() - start_time)

        for node_id in top_k:
            human_string = label_lines[node_id]
            score = predictions[0][node_id]
            print('%s (score = %.5f)' % (human_string, score))

        print '********* Session Ended *********'

        if cv2.waitKey(1) & 0xFF == ord('q'):
            sess.close()
            break

camera.release()
cv2.destroyAllWindows()

EDIT 2

After fiddling around, I stumbled into graph quantization and graph transformation and these were the attained results.

Original Graph: 87.4MB

Quantized Graph: 87.5MB

Transformed Graph: 87.1MB

Eight Bit Calculation: 22MB but ran into this upon use.

eshirima
  • 3,837
  • 5
  • 37
  • 61
  • 1
    See: https://medium.com/towards-data-science/building-a-real-time-object-recognition-app-with-tensorflow-and-opencv-b7a2b4ebdc32 – Ruut Jul 05 '17 at 20:40
  • @Ruut I've read this post in June 23rd; the next day it came out, actually. He uses multi-threading to speed up the I/O operation. I've been meaning to try this out on my project but the actual prediction still takes 0.4-0.8 seconds and I think this because of my huge model. I'm still looking for ways to make it smaller – eshirima Jul 05 '17 at 20:45
  • Hello, I'm trying to achieve almost the same thing as you wish to: Real-time Tensorflow Classification via OpenCV `Videocapture`. Predictions take a constant 0.4 seconds. I installed Tensorflow with GPU support. May I ask whether `(1)` you are on CPU/GPU build? `(2)` Do you happen to have a solution to achieve an even faster speed? Appreciate your advice. Thank you. – Keith OYS Jul 21 '17 at 01:47
  • Update: Using `quantize_graph.py` to do further optimization to my model, allowed me to cut 0.1 seconds off the inferencing speed. It's now at constant 0.3 seconds. – Keith OYS Jul 21 '17 at 02:20
  • 1
    @KeithOYS I wasn't using a GPU. And the reason behind it is because I was going to port over my model to the phone. 2: I still get the error referenced above when trying to run my quantized graph. I/O is currently been run on the main thread so consider moving it over to a separate one. [Example](http://www.pyimagesearch.com/2015/12/21/increasing-webcam-fps-with-python-and-opencv/). This will help speed things up. – eshirima Jul 21 '17 at 12:34
  • 1
    @KeithOYS With regards to speeding up the actual model, consider retraining it for [mobilenet](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/tutorials/image_retraining.md#other-model-architectures) instead – eshirima Jul 21 '17 at 12:36

1 Answers1

1

I recently added the option to train the smaller Mobilenet models using TensorFlow for Poets: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/tutorials/image_retraining.md#other-model-architectures

This may help speed up your classification, at the cost of some accuracy.

Pete Warden
  • 2,866
  • 1
  • 13
  • 12
  • Thank u.. I'll give it a spin and let u know.. One more thing, I was messing around with the object detection api and managed to train it locally on my own dataset. Since you're a Google Engineer, I wanted your take on my [approach](https://stackoverflow.com/questions/44973184/train-tensorflow-object-detection-on-own-dataset) and what I could've done to improve it next time around. Thanks a lot for your help! – eshirima Jul 11 '17 at 02:04
  • did u make it realtime? Im facing the same problem even in desktop –  Jul 31 '17 at 09:00
  • @HaraHaraMahadevaki I haven't gotten around to this in a while. I'd recommend re-training it on mobile-net. – eshirima Jul 31 '17 at 15:55
  • can u please give me the link for doing so? –  Aug 01 '17 at 01:29
  • by the way, try using this approach. it gives considerable performance boost http://www.pyimagesearch.com/2015/12/21/increasing-webcam-fps-with-python-and-opencv/ –  Aug 01 '17 at 01:32
  • @HaraHaraMahadevaki I've already stumbled into that blog.. I used it for my object detection instead. [Here's](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/tutorials/image_retraining.md#other-model-architectures) the link to retrain on mobilenet. Please [tag](https://meta.stackexchange.com/a/43020) me for me to see your response – eshirima Aug 01 '17 at 13:15