I have acquired the Nvidia Jetson TK1 a few weeks ago and I'm trying to use CPU and GPU at the same time, hence the use of the Stream class. With a simple test I realize it does not do what I think it should, I'm probably using it wrong, or maybe a compiler option.
I checked this link for answers before posting this question : how to use gpu::Stream in OpenCV?
Here is my code :
#include <stdio.h>
#include <iostream>
#include "opencv2/core/core.hpp"
#include "opencv2/features2d/features2d.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/gpu/gpu.hpp"
#include <time.h>
using namespace cv;
using namespace std;
using namespace gpu;
int main(int argc,char** argv)
{
unsigned long AAtime=0, BBtime=0;
gpu::setDevice(0);
gpu::FeatureSet(FEATURE_SET_COMPUTE_30);
Mat host_src= imread(argv[1],0);
GpuMat gpu_src, gpu_dst;
Stream stream;
gpu_src.upload(host_src);
AAtime = getTickCount();
blur(gpu_src, gpu_dst, Size(5,5), Point(-1,-1), stream);
//Cpu function
int k=0;
for(unsigned long long int j=0;j<10;j++)
for(unsigned long long int i=0;i<10000000;i++)
k+=rand();
stream.waitForCompletion();
Mat host_dst;
BBtime = getTickCount();
cout<<(BBtime - AAtime)/getTickFrequency()<<endl;
gpu_dst.download(host_dst);
return 0;
}
With the timer function I saw that the overall time is CPU + GPU, not the longest of the two, so they do not work in parallel. I tried using the CudaMem as jet47 showed but when I watch the image it's only stripes and not my image:
CudaMem host_src_pl(Size(900, 1200), CV_8UC1, CudaMem::ALLOC_PAGE_LOCKED); // My image is 1200 by 900
CudaMem host_dst_pl;
Mat host_src= imread(argv[1],0);
host_src = host_src_pl;
//rest of the code
To compile I used this command : "g++ -Ofast -mfpu=neon -funsafe-math-optimizations -fabi-version=8 -Wabi -std=c++11 -march=armv7-a testStream.cpp -fopenmp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o gpuStream" Some might be redundant, I tried without them and it does the same.
What do I miss? Thanks for you answers :)