10

My goal is to run a TensorFlow model in real time to control a vehicle from a learned model. Our vehicle system uses ROS (Robot Operating System) which is tied closely to OpenCV. So, I receive an OpenCV Mat containing the image of interest from ROS.

    cv::Mat cameraImg;

I would like to create a Tensorflow Tensor directly from the data in this OpenCV matrix to avoid the expense of copying the matrix line-by-line. Using the answer to This Question I have managed to get the forward pass of the network working with the following code:

cameraImg.convertTo(cameraImg, CV_32FC3);

Tensor inputImg(DT_FLOAT, TensorShape({1,inputheight,inputwidth,3}));
auto inputImageMapped = inputImg.tensor<float, 4>();
auto start = std::chrono::system_clock::now();
//Copy all the data over
for (int y = 0; y < inputheight; ++y) {
    const float* source_row = ((float*)cameraImg.data) + (y * inputwidth * 3);
    for (int x = 0; x < inputwidth; ++x) {
        const float* source_pixel = source_row + (x * 3);
        inputImageMapped(0, y, x, 0) = source_pixel[2];
        inputImageMapped(0, y, x, 1) = source_pixel[1];
        inputImageMapped(0, y, x, 2) = source_pixel[0];
    }
}
auto end = std::chrono::system_clock::now();

However, using this method the copy to the tensor takes between 80ms and 130ms, while the entire forward pass (for a 10-layer convolutional network) only takes 25ms.

Looking at the tensorflow documentation, it appears there is a Tensor constructor that takes an allocator. However, I have not been able to find any Tensorflow or Eigen documentation relating to this functionality or the Eigen Map class as it relates to Tensors.

Does anyone have any insight into how this code can be sped up, ideally by re-using my OpenCV memory?

EDIT: I have successfully implemented what @mrry suggested, and can re-use the memory allocated by OpenCV. I have opened github issue 8033 requesting this be added to the tensorflow source tree. My method isn't that pretty, but it works.

It is still very difficult to compile an external library and link it to the libtensorflow.so library. Potentially the tensorflow cmake library will help with this, I have not yet tried it.

Community
  • 1
  • 1
Paul
  • 433
  • 4
  • 8

2 Answers2

18

I know that it is old thread but there is a zero copy solution to your problem using the existing C++ API: I updated your github issue with my solution. tensorflow/issues/8033

For the record I copy my solution here:

// allocate a Tensor
Tensor inputImg(DT_FLOAT, TensorShape({1,inputHeight,inputWidth,3}));

// get pointer to memory for that Tensor
float *p = inputImg.flat<float>().data();
// create a "fake" cv::Mat from it 
cv::Mat cameraImg(inputHeight, inputWidth, CV_32FC3, p);

// use it here as a destination
cv::Mat imagePixels = ...; // get data from your video pipeline
imagePixels.convertTo(cameraImg, CV_32FC3);

Hope this helps

  • 2
    Any way to do this for a tensor which has a batch of images? Eg. `Tensor inputImg(DT_FLOAT, TensorShape({4,inputHeight,inputWidth,3}));` Since tensors aren't subscriptable, I'd imagine you would create tensors for each image, load the `cv::Mat` into it, and combine them somehow to forrn `inputImg`. If so, is there a way to combine tensors? – Jim Nov 30 '17 at 13:56
9

The TensorFlow C API (as opposed to the C++ API) exports the TF_NewTensor() function, which allows you to create a tensor from a pointer and a length, and you can pass the resulting object to the TF_Run() function.

Currently this is the only public API for creating a TensorFlow tensor from a pre-allocated buffer. There is no supported way to cast a TF_Tensor* to a tensorflow::Tensor but if you look at the implementation there is a private API with friend access that can do this. If you experiment with this, and can show an appreciable speedup, we'd consider a feature request for adding this to the public API.

mrry
  • 125,488
  • 26
  • 399
  • 400
  • Awesome, thanks for the quick response! I will try to use the C api to get a tensor into my code. I use the C++ api currently to load my graph from disk, so I guess I will have to try to use the private friend API. I will let you know as soon as I can get results. – Paul Sep 08 '16 at 15:39
  • I am trying to get this to work with my external library but having no luck compiling it. I can get the compiler happy by declaring the TensorCApi class and the TF_Tensor class in my source. However, when I run it, I get an error: `/opt/ros/indigo/lib/nodelet/nodelet: symbol lookup error: /home/paul/catkin_ws/devel/lib//libDeepModel.so: undefined symbol: TF_NewTensor`. When I tried to define the class in the tensorflow source (in `c_api.h`), it seems it is not being compiled. I am using the `//tensorflow:libtensorflow.so` build target, does this build target not include the C api? – Paul Oct 04 '16 at 23:29
  • 1
    Did you ever get this working? If so, do you have any code snippets you can share? I am also looking to interface opencv from ROS and need something like this. I was amazed when I started looking into this that the only way to get an opencv Mat into a Tensor was a horribly slow copy loop. – dcofer Jan 08 '17 at 11:43
  • @Paul, I know this is quite some time later, but aren't you supposed to compile //tensorflow:libtensorflow_c.so instead? Also, did you succed in doing the above? I basically need to to the same, and was very disappointed that the C++ API cannot do this – Allan Nørgaard Jan 17 '17 at 14:49
  • @allan, I did get this working. I edited my initial question to point to an explanation. I also opened a github issue to hopefully get this included in the tensorflow source. – Paul Mar 02 '17 at 22:50
  • @Paul, Could you also get it working in C++? Can you please share your code? – mohaghighat Mar 03 '17 at 22:34
  • @Paul We ended up opening a github issue too, and were told that it is currently possible in the C API only. But using pure memcpy turned out to be fast enough for our C++ implementation (~60fps HD images) – Allan Nørgaard Mar 06 '17 at 12:08
  • @MBA see above ;) The memcpy can eat raw amounts of data in no time, I bet it might be enough for your problem aswell, although it is not the preferred method – Allan Nørgaard Mar 06 '17 at 12:09
  • @MBA I had to declare the `TF_Tensor` type in my code: `struct TF_Tensor { TF_DataType dtype; TensorShape shape; TensorBuffer* buffer; };` and add several additional tensorflow directories in my cmake file to get it to compile. I don't have a public version of the code yet, and I don't think there is any officially supported way of doing this out of source build with the C++ interface. – Paul Mar 06 '17 at 18:50
  • @AllanNørgaard Thanks for the tip with memcopy. Sounds like a viable solution as well. – Paul Mar 06 '17 at 18:54
  • @AllanNørgaard Could you llink the github issue you created? I'd love to look at the memcpy solution. – Jim Nov 30 '17 at 13:46