I am testing the machine learning waters and used TS inception model to retrain the network to classify my desired objects.
Initially, my predictions were run on locally stored images and I realized that it took anywhere between 2-5 seconds to unpersist the graph from a file and around the same time to run the actual predictions.
Thereafter, I adapted my code to incorporate the camera feed from OpenCV but with the above noted times, video lags are inevitable.
A time hit was expected during initial graph load; which is why initialSetup()
is ran beforehand, but 2-5 seconds is just absurd.
I feel like with my current application; real-time classification, this is not the best way of loading it. Is there another way of doing this? I know with mobile versions TS recommends trimming down the graph. Would slimming it down be the way to go here? In case it matters my graph is currently 87.4MB
Along with this, is there a way of speeding up the prediction process?
import os
import cv2
import timeit
import numpy as np
import tensorflow as tf
camera = cv2.VideoCapture(0)
# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
in tf.gfile.GFile('retrained_labels.txt')]
def grabVideoFeed():
grabbed, frame = camera.read()
return frame if grabbed else None
def initialSetup():
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
start_time = timeit.default_timer()
# This takes 2-5 seconds to run
# Unpersists graph from file
with tf.gfile.FastGFile('retrained_graph.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
print 'Took {} seconds to unpersist the graph'.format(timeit.default_timer() - start_time)
def classify(image_data):
print '********* Session Start *********'
with tf.Session() as sess:
start_time = timeit.default_timer()
# Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
print 'Tensor', softmax_tensor
print 'Took {} seconds to feed data to graph'.format(timeit.default_timer() - start_time)
start_time = timeit.default_timer()
# This takes 2-5 seconds as well
predictions = sess.run(softmax_tensor, {'Mul:0': image_data})
print 'Took {} seconds to perform prediction'.format(timeit.default_timer() - start_time)
start_time = timeit.default_timer()
# Sort to show labels of first prediction in order of confidence
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
print 'Took {} seconds to sort the predictions'.format(timeit.default_timer() - start_time)
for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
print '********* Session Ended *********'
initialSetup()
while True:
frame = grabVideoFeed()
if frame is None:
raise SystemError('Issue grabbing the frame')
frame = cv2.resize(frame, (299, 299), interpolation=cv2.INTER_CUBIC)
# adhere to TS graph input structure
numpy_frame = np.asarray(frame)
numpy_frame = cv2.normalize(numpy_frame.astype('float'), None, -0.5, .5, cv2.NORM_MINMAX)
numpy_final = np.expand_dims(numpy_frame, axis=0)
classify(numpy_final)
cv2.imshow('Main', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
camera.release()
cv2.destroyAllWindows()
EDIT 1
After debugging my code, I realized that session creation is a both resource and time consuming operation.
In the prior code, a new session was created for each OpenCV feed on top of running the predictions. Wrapping the OpenCV operations inside a single session provides a massive time improvement but this still adds a massive overhead on the initial run; prediction takes 2-3 seconds. Afterwards, the prediction takes around 0.5s which makes the camera feed still laggy.
import os
import cv2
import timeit
import numpy as np
import tensorflow as tf
camera = cv2.VideoCapture(0)
# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
in tf.gfile.GFile('retrained_labels.txt')]
def grabVideoFeed():
grabbed, frame = camera.read()
return frame if grabbed else None
def initialSetup():
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
start_time = timeit.default_timer()
# This takes 2-5 seconds to run
# Unpersists graph from file
with tf.gfile.FastGFile('retrained_graph.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
print 'Took {} seconds to unpersist the graph'.format(timeit.default_timer() - start_time)
initialSetup()
with tf.Session() as sess:
start_time = timeit.default_timer()
# Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
print 'Took {} seconds to feed data to graph'.format(timeit.default_timer() - start_time)
while True:
frame = grabVideoFeed()
if frame is None:
raise SystemError('Issue grabbing the frame')
frame = cv2.resize(frame, (299, 299), interpolation=cv2.INTER_CUBIC)
cv2.imshow('Main', frame)
# adhere to TS graph input structure
numpy_frame = np.asarray(frame)
numpy_frame = cv2.normalize(numpy_frame.astype('float'), None, -0.5, .5, cv2.NORM_MINMAX)
numpy_final = np.expand_dims(numpy_frame, axis=0)
start_time = timeit.default_timer()
# This takes 2-5 seconds as well
predictions = sess.run(softmax_tensor, {'Mul:0': numpy_final})
print 'Took {} seconds to perform prediction'.format(timeit.default_timer() - start_time)
start_time = timeit.default_timer()
# Sort to show labels of first prediction in order of confidence
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
print 'Took {} seconds to sort the predictions'.format(timeit.default_timer() - start_time)
for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
print '********* Session Ended *********'
if cv2.waitKey(1) & 0xFF == ord('q'):
sess.close()
break
camera.release()
cv2.destroyAllWindows()
EDIT 2
After fiddling around, I stumbled into graph quantization and graph transformation and these were the attained results.
Original Graph: 87.4MB
Quantized Graph: 87.5MB
Transformed Graph: 87.1MB
Eight Bit Calculation: 22MB but ran into this upon use.