How to do batching in Tensorflow Serving?

Question

Deployed Tensorflow Serving and ran test for Inception-V3. Works fine.

Now, would like to do batching for serving for Inception-V3. E.g. would like to send 10 images for prediction instead of one.

How to do that? Which files to update (inception_saved_model.py or inception_client.py)? What those update look like? and how are the images passed to the serving -is it passed as a folder containing images or how?

Appreciate some insight into this issue. Any code snippet related to this will be extremely helpful.

=================================

Updated inception_client.py

# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

#!/usr/bin/env python2.7

"""Send JPEG image to tensorflow_model_server loaded with inception model.
"""

from __future__ import print_function

"""Send JPEG image to tensorflow_model_server loaded with inception model.
"""

from __future__ import print_function

# This is a placeholder for a Google-internal import.

from grpc.beta import implementations
import tensorflow as tf
from tensorflow.python.platform import flags
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2


tf.app.flags.DEFINE_string('server', 'localhost:9000',
                            'PredictionService host:port')
tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format')
FLAGS = tf.app.flags.FLAGS


def main(_):
   host, port = FLAGS.server.split(':')
   channel = implementations.insecure_channel(host, int(port))
   stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
   # Send request
   #with open(FLAGS.image, 'rb') as f:
     # See prediction_service.proto for gRPC request/response details.
     #data = f.read()
     #request = predict_pb2.PredictRequest()
     #request.model_spec.name = 'inception'
     #request.model_spec.signature_name = 'predict_images'


 #    request.inputs['images'].CopyFrom(
 #        tf.contrib.util.make_tensor_proto(data, shape=[1]))
 #    result = stub.Predict(request, 10.0)  # 10 secs timeout
 #    print(result)


# Build a batch of images

    request = predict_pb2.PredictRequest()
 request.model_spec.name = 'inception'
 request.model_spec.signature_name = 'predict_images'
  
  image_data = []
  for image in FLAGS.image.split(','):
   with open(image, 'rb') as f:
     image_data.append(f.read())
  
  request.inputs['images'].CopyFrom(
      tf.contrib.util.make_tensor_proto(image_data, shape=[len(image_data)]))
  
  result = stub.Predict(request, 10.0)  # 10 secs timeout
  print(result)
 if __name__ == '__main__':
   tf.app.run()

Can you check the indentation on the code you pasted? (It's probably an issue with the Stack Overflow formatting, but it could be hiding a bug.) And what is the current error you're getting? — mrry, Mar 01 '17 at 22:37
Looks like stack Overflow format issue. Will try to fix that. Here is the error.bazel-bin/tensorflow_serving/example/inception_batch_client --server=localhost:9000 —image=/home/gpuadmin/serving/images/boat.jpg,/home/gpuadmin/serving/images/boat.jpg Traceback (most recent call last): File "/home/gpuadmin/serving/bazel-bin/tensorflow_serving/example/inception_batch_client.runfiles/tf_serving/tensorflow_serving/example/inception_batch_client.py", line 63, in with open(image, 'rb') as f: IOError: [Errno 2] No such file or directory: '' — tech_a_break, Mar 01 '17 at 22:42
The error is being raised because you're trying to read a file that's not found. It seems to have be trying to open `''` (the empty string), so maybe the `FLAGS.image` doesn't have the right format? Perhaps try printing `FLAGS.image.split(',')` to find what's going wrong? — mrry, Mar 01 '17 at 22:43
@mrry Have a followup question. How to do performance tuning of batching using max_batch_size, batch_timeout_micros, num_batch_threads and other parameters? Tried using these parameters with the Query client, it doesn't work. — tech_a_break, Mar 06 '17 at 21:12
I don't know. Might be worth asking another question on the [tensorflow-serving] tag and one of the TF Serving team probably knows a good answer to this. — mrry, Mar 06 '17 at 21:13

mrry · Accepted Answer · 2017-03-01T22:42:46.767

12

You should be able to compute predictions for a batch of images with a small change to the request construction code in inception_client.py. The following lines in that file create a request with a "batch" containing a single image (note shape=[1], which means "a vector of length 1"):

with open(FLAGS.image, 'rb') as f:
  # See prediction_service.proto for gRPC request/response details.
  data = f.read()
  request = predict_pb2.PredictRequest()
  request.model_spec.name = 'inception'
  request.model_spec.signature_name = 'predict_images'
  request.inputs['images'].CopyFrom(
      tf.contrib.util.make_tensor_proto(data, shape=[1]))
  result = stub.Predict(request, 10.0)  # 10 secs timeout
  print(result)

You can pass more images in the same vector to run predictions on a batch of data. For example, if FLAGS.image were a comma-separated list of filenames:

request = predict_pb2.PredictRequest()
request.model_spec.name = 'inception'
request.model_spec.signature_name = 'predict_images'

# Build a batch of images.
image_data = []
for image in FLAGS.image.split(','):
  with open(image, 'rb') as f:
    image_data.append(f.read())

request.inputs['images'].CopyFrom(
    tf.contrib.util.make_tensor_proto(image_data, shape=[len(image_data)]))

result = stub.Predict(request, 10.0)  # 10 secs timeout
print(result)

 if __name__ == '__main__':
   tf.app.run()

edited Mar 01 '17 at 22:42

answered Feb 28 '17 at 22:51

mrry

125,488
26
399
400

Thanks @mmry. After making the changes, created build for the client again. While querying for inference received error for request.inputs['images'].CopyFrom( NameError: name 'request' is not defined. – tech_a_break Mar 01 '17 at 18:13
Passing 2 images for inference using bazel-bin/tensorflow_serving/example/inception_batch_client --server=localhost:9000 --image=/home/gpuadmin/serving/images/boat.jpg,/home/gpuadmin/serving/images/table.jpg Traceback (most recent call last): File "/home/useradmin/serving/bazel-bin/tensorflow_serving/example/inception_batch_client.runfiles/tf_serving/tensorflow_serving/example/inception_batch_client.py", line 55, in request.inputs['images'].CopyFrom( NameError: name 'request' is not defined – tech_a_break Mar 01 '17 at 18:17
Perhaps a typo in the code? The name `request` should be defined by the line `request = predict_pb2.PredictRequest()`. – mrry Mar 01 '17 at 18:18
Thanks @mmry. Included request = ... in the with clause. Faced similar issues with stub,channel and host - not defined. Although they are in the def main. Is this a scoping issue? tried to place them under image_data = [] and received below error – tech_a_break Mar 01 '17 at 18:52
I'm having a hard time imagining what your code looks like from the description. Can you post your current code as edit to the question, along with a full stack trace of the error? – mrry Mar 01 '17 at 18:53
Traceback (most recent call last): File "/home/gpuadmin/serving/bazel-bin/tensorflow_serving/example/inception_batch_client.runfiles/tf_serving/tensorflow_serving/example/inception_batch_client.py", line 62, in result = stub.Predict(request, 10.0) # 10 secs timeout File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 323, in __call__ self._request_serializer, self._response_deserializer) – tech_a_break Mar 01 '17 at 18:54
File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 209, in _blocking_unary_unary raise _abortion_error(rpc_error_call) grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Missing ModelSpec”) – tech_a_break Mar 01 '17 at 18:54
Are you setting `request.model_spec` like I did in the answer? Again it's impossible for me to suggest a fix without seeing your code. – mrry Mar 01 '17 at 19:12
Code attached in the question. – tech_a_break Mar 01 '17 at 22:36
hey, it's not working for me... I've got a question, do I need to change the way I run the tensorflow_model_server? I have noticed a --enable_batching parameter but i'm not sure :/ – Rodrigo Laguna Sep 19 '18 at 20:03
1

`--enable_batching` is for server side batching – JiaHao Xu Oct 29 '21 at 08:15

How to do batching in Tensorflow Serving?

1 Answers1

Linked