3

I have some data, an array of 400k rows, with each row having 200 elements.

I'm trying to use this data to do knn matching using FLANN. Using the pyflann library, which is just a set of bindings to the C++ code, I get outstanding performance. It takes very little time to build the index, and then once the index is built the speed and accuracy of the knn search is fantastic.

This is the code used to build and query the index using python:


import pyflann

flann = pyflann.FLANN()

# data is np.array(400000, 200), dtype = np.float32, rows are L2-normed 
flann.build_index(data, algorithm="kmeans", branching=32, iterations=7, checks=16)

ind = 100
k = 5
vector = data[ind, :].reshape(1, 200)
t0 = time.time()
D, I = flann.nn_index(vector, k, algorithm="kmeans", branching=32, iterations=7, checks=16)
t1 = time.time()
t = t1 - t0

Typically the index is built in around 20 seconds, with the results D, I being calculated in less that 0.001 seconds on my machine.

I'm trying to achieve the same performance using C++. In particular I am using the version of FLANN that is bundled with opencv.

I have copied the default parameter values from pyflann into the index parameters, and attempt to setup kmeans, but I get terrible performance, I normally have to kill the indexing process after a few minutes.


cvflann::IndexParams getParams(){
    cvflann::IndexParams indexParams;
    indexParams["algorithm"] = cvflann::FLANN_INDEX_KMEANS;
    //branching=32, iterations=7, checks=16
    indexParams["branching"] = 32;
    indexParams["iterations"] = 7;
    indexParams["checks"] = 16;

    indexParams["eps"] = 0.0;
    indexParams["cb_index"] = 0.5;
    indexParams["trees"] = 1;
    indexParams["leaf_max_size"] = 4;
    indexParams["centers_init"] = cvflann::FLANN_CENTERS_RANDOM;
    indexParams["target_precision"] = 0.9;
    indexParams["build_weight"] = 0.01;
    indexParams["memory_weight"] = 0.0;
    indexParams["sample_fraction"] = 0.1;
    indexParams["log_level"] =  "warning";
    indexParams["random_seed"] = -1;
    return indexParams;
}

cvflann::Matrix dataset(this->data[0], 400000, 200);
cvflann::IndexParams ip = getParams();
index = new cvflann::Index >(dataset, ip);  
index->buildIndex(); //hangs


Is there something I'm missing? Why would there be such a difference in performance here? Any advice would be greatfully received.

Edit:

Compiler:


g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

There are no flags used other than those provided by pkg-config --cflags opencv and pkg-config --libs opencv, along with -std=c++11.

OpenCV version number is 3.4.1.

James
  • 3,957
  • 4
  • 37
  • 82
  • Classic set of questions: compiler?, flags?, optimization?, debug or release? – R2RT Jun 11 '18 at 19:25
  • Try with `-O3` in `g++` flags to enable optimization. Also, `-DNDEBUG` might help – R2RT Jun 11 '18 at 19:36
  • I tried the compiler flags but it didn't make any difference. I think it's more likely something to do with my configuration setup and then choice of objects I use. Thanks for the suggestion though – James Jun 12 '18 at 08:22
  • Have you taken a look at the `cores`-parameter? For pyflann that's set at 0 by default, which means it performs the search on all available cores. – Tim Hilt Jan 16 '21 at 14:19

0 Answers0