18

I'm working on a reinforcement learning model implemented with Keras and Tensorflow. I have to do frequent calls to model.predict() on single inputs.

While testing inference on a simple pretrained model, I noticed that using Keras' model.predict is WAY slower than just using Numpy on stored weights. Why is it that slow and how can I accelerate it? Using pure Numpy is not viable for complex models.

import timeit
import numpy as np
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense

w = np.array([[-1., 1., 0., 0.], [0., 0., -1., 1.]]).T
b = np.array([ 15., -15., -21., 21.])

model = Sequential()
model.add(Dense(4, input_dim=2, activation='linear'))
model.layers[0].set_weights([w.T, b])
model.compile(loss='mse', optimizer='adam')

state = np.array([-23.5, 17.8])

def predict_very_slow():
    return model.predict(state[np.newaxis])[0]

def predict_slow():
    ws = model.layers[0].get_weights()
    return np.matmul(ws[0].T, state) + ws[1]

def predict_fast():
    return np.matmul(w, state) + b

print(
    timeit.timeit(predict_very_slow, number=10000),
    timeit.timeit(predict_slow, number=10000),
    timeit.timeit(predict_fast, number=10000)
)
# 5.168972805004538 1.6963867129435828 0.021918574168087623
# 5.461319456664639 1.5491559107269515 0.021502970783442876
alexander
  • 519
  • 6
  • 17

4 Answers4

12

A little late, but maybe useful for someone:

Replace model.predict(X) with model.predict(X, batch_size=len(X))

That should do it.

Sterling Archer
  • 22,070
  • 18
  • 81
  • 118
  • 1
    This saves a tonne of time. Only catch is to make sure that `batch_size` is not very large or else tensorflow will throw an `OOM`. – Ruthvik Vaila Dec 14 '19 at 19:24
  • 1
    This worked for me when some of the other more complicated solutions like using a compiled/uncompiled model did not. – tim654321 Mar 26 '20 at 10:39
4

Are you running your Keras model (with TensorFlow backend) in a loop? If so, Keras has a memory leak issue identified here: LINK

In this case you have to import the following:

import keras.backend.tensorflow_backend
import tensorflow as tf

from keras.backend import clear_session

Finally, you have to put the following at the end of every iteration of a loop after you're done doing your computations:

clear_session()
if keras.backend.tensorflow_backend._SESSION:
    tf.reset_default_graph()
    keras.backend.tensorflow_backend._SESSION.close()
    keras.backend.tensorflow_backend._SESSION = None

This should help you free up memory at the end of every loop and eventually, make the process faster. I hope this helps.

troymyname00
  • 670
  • 1
  • 14
  • 32
  • 1
    As I understand it, the memory leak is only a problem if you create models in a loop. I only use a single model object in the test case. – alexander Feb 15 '18 at 00:46
  • 1
    @schoeberl Understood. There might be several ways to make a single model faster, and that would depend on how you're setting up the model. A quick search led me to this, which I thought might be helpful: [LINK](https://stackoverflow.com/questions/42184863/how-do-you-make-tensorflow-keras-fast-with-a-tfrecord-dataset) – troymyname00 Feb 16 '18 at 00:58
0

The memory leak issue still seems to persist in Keras. The following lines of code mentioned in that issue did the trick for me:

import ... as K
import gc

model = ....
del model
K.clear_session()
gc.collect()
Hagbard
  • 3,430
  • 5
  • 28
  • 64
-1

If you prefer to stay with the network instead of numpy calculations you could try OpenVINO. OpenVINO is optimized for Intel hardware but it should work with any CPU. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. Better performance should be especially visible with larger networks.

It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets below.

Install OpenVINO

The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.

pip install openvino-dev[tensorflow2]

Save your model as SavedModel

OpenVINO is not able to convert HDF5 model, so you have to save it as SavedModel first.

import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')

Use Model Optimizer to convert SavedModel model

The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type). Run in the command line:

mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"

Run the inference

The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.

dragon7
  • 1,057
  • 9
  • 23