0

I have trained a UNET for Semantic Segmentation based on this library.

I am currently trying to run the network in real-time in a ROS Node that takes the raw image from a topic, produces the segmentation mask and publishes it in another topic.

The code I have written so far is working semi-well, meaning it works in principle but it is very slow - like 0.5 FPS. I am aiming at least at 15FPS.

I am not sure if the callback() function takes so long because the inference simply takes too long on my computer (i5-6500 with tensorflow-cpu since I don't have an NVIDIA GPU), or what I think is more likely, the set_session(sess) in the callback function slows it down.

How can I define this session outside the callback function, so that it doesn't slow my code down? Note that the code below the model inference is just for further image processing, the function is still slow without it.

TL;DR: How do I define set_session() outside the callback()-function so that it does not slow down my code?

#!/usr/bin/env python

#This code is partly based on an example found at:
#https://github.com/isarlab-department-engineering/ros_dt_lane_follower/blob/master/src/lane_detection.py

import rospy
import numpy as np
import cv2
import math
import os
import tensorflow as tf
from cv_bridge import CvBridge
from sensor_msgs.msg import Image
from keras_segmentation.predict import predict
from keras_segmentation.models.unet import vgg_unet
from keras_segmentation.predict import model_from_checkpoint_path

from tensorflow.python.keras.backend import set_session
from tensorflow.python.keras.models import load_model


sess = tf.Session()
graph = tf.get_default_graph()

# IMPORTANT: models have to be loaded AFTER SETTING THE SESSION for keras! 
# Otherwise, their weights will be unavailable in the threads after the session there has been set
set_session(sess)



import tensorflow as tf
graph = tf.get_default_graph()


seg = model_from_checkpoint_path("vgg_unet_1")
seg._make_predict_function()

#definitions and declarations

bridge = CvBridge()

pub_image = rospy.Publisher('/Segmentation_image',Image,queue_size=1)


with graph.as_default():
    set_session(sess)

#callback is executed once for each frame

def callback(data):


#make OpenCV able to process image
    image = bridge.imgmsg_to_cv2(data)

    image = cv2.cvtColor(image,cv2.COLOR_RGB2BGR )  

    image = cv2.resize(image, (608, 416))
    global sess
    global graph
    with graph.as_default():
        set_session(sess)

        '''erg = predict(
            model=seg,
            inp=image,'''
            #out_fname=None
        erg = seg.predict(np.array([image]))[0]

        #)

    print (erg.shape)

    erg = erg.astype(np.uint8)


def lane_detect():
    rospy.init_node('Segmentation',anonymous=True)
    #rospy.Subscriber("/cv_camera/image_raw",Image,callback,queue_size=1,buff_size=2**24)

    rospy.Subscriber("/movie_raw",Image,callback,queue_size=1,buff_size=2**24)
    try:
        rospy.loginfo("Entering ROS Spin")
        rospy.spin()
    except KeyboardInterrupt:
        print("Shutting down")



if __name__ == '__main__':
    try:

        lane_detect()
    except rospy.ROSInterruptException:
        pass

'''
Rafa Guillermo
  • 14,474
  • 3
  • 18
  • 54
  • 1
    I would strongly recommend trying to separate out better your ros code from your keras/tf code. Then I would try timing your keras/tf code outside of ros if possible. I'd also try timing the ros code (with your img conversions) without calling the keras/tf to time that. If you're not able to force-feed (even the same image) to the keras/tf code at-or-above an acceptable rate, then you're at least bottlenecked by that. Also, you could have a [ros timer](https://wiki.ros.org/rospy/Overview/Time#Timer) that runs the keras/tf code on global img at your desired rate, & report the actual frequency. – JWCS Jun 01 '20 at 17:02
  • Thank you for your feedback, I have tried a few things and now I think the inference by the CPU is indeed slowing things down. I will try the whole code on a machine with a GPU and see if it makes a difference. What do you exactly mean by separating the Image conversions from keras/tf? Make a separate Node for segmentation and then do the rest of the processing in another node? – Canorvantis Jun 04 '20 at 10:18
  • No, I'm talking about the physical code files/modules. Pull out of this file everything that doesn't need ros, and make the interface to that simple, like an initialization step/constructor and a process/transform/compute function. That will allow you to time just the keras/tf. Then you can call that code, from the ros node. The image conversion, aka from sensor_msgs/Image to the correct opencv image/size, is done in the ros node; you should time how fast this can just receive & cvtColor/resize the images, perhaps by publishing an empty msg each time. – JWCS Jun 04 '20 at 13:55
  • The main thing may not be CPU vs GPU; if you've written your code well, or badly, you should be able to tell by just timing it. If there's only one part that's slow, you only need to address that part. _If_ that slow part _could_ be solved by GPU acceleration, you might have to explicitly do stuff for that to happen - which requires you to know which part is the part that is slow. For example, if you throw images at your node at a freq of 100Hz, and it cvtColor/resize, and publish std_msgs/Empty at 30 Hz, (`rostopic hz`) then your ros node is not the problem; it's fast enough. – JWCS Jun 04 '20 at 13:59
  • On the contrary, if you take an image, and in a testing file import your keras/tf class/module, and just try to statistically figure out how long it takes to process a single image of (MxN) size (ex test std. dev. and mean for 10000 runs), then you might find it's the bottle neck, at 5-10Hz. But, knowing that number, you can try different sizes of the image, (M/2 x N/2, M*.75 x N*.75, etc) to see what size meets your processing speed requirement. If it ends up being too small to be useful, then you can either change the hardware, or dig into the keras/tf code to see if you can improve it. – JWCS Jun 04 '20 at 14:03

0 Answers0