I have some data, an array of 400k rows, with each row having 200 elements.
I'm trying to use this data to do knn matching using FLANN. Using the pyflann library, which is just a set of bindings to the C++ code, I get outstanding performance. It takes very little time to build the index, and then once the index is built the speed and accuracy of the knn search is fantastic.
This is the code used to build and query the index using python:
import pyflann
flann = pyflann.FLANN()
# data is np.array(400000, 200), dtype = np.float32, rows are L2-normed
flann.build_index(data, algorithm="kmeans", branching=32, iterations=7, checks=16)
ind = 100
k = 5
vector = data[ind, :].reshape(1, 200)
t0 = time.time()
D, I = flann.nn_index(vector, k, algorithm="kmeans", branching=32, iterations=7, checks=16)
t1 = time.time()
t = t1 - t0
Typically the index is built in around 20 seconds, with the results D, I
being calculated in less that 0.001 seconds on my machine.
I'm trying to achieve the same performance using C++. In particular I am using the version of FLANN that is bundled with opencv.
I have copied the default parameter values from pyflann into the index parameters, and attempt to setup kmeans, but I get terrible performance, I normally have to kill the indexing process after a few minutes.
cvflann::IndexParams getParams(){
cvflann::IndexParams indexParams;
indexParams["algorithm"] = cvflann::FLANN_INDEX_KMEANS;
//branching=32, iterations=7, checks=16
indexParams["branching"] = 32;
indexParams["iterations"] = 7;
indexParams["checks"] = 16;
indexParams["eps"] = 0.0;
indexParams["cb_index"] = 0.5;
indexParams["trees"] = 1;
indexParams["leaf_max_size"] = 4;
indexParams["centers_init"] = cvflann::FLANN_CENTERS_RANDOM;
indexParams["target_precision"] = 0.9;
indexParams["build_weight"] = 0.01;
indexParams["memory_weight"] = 0.0;
indexParams["sample_fraction"] = 0.1;
indexParams["log_level"] = "warning";
indexParams["random_seed"] = -1;
return indexParams;
}
cvflann::Matrix dataset(this->data[0], 400000, 200);
cvflann::IndexParams ip = getParams();
index = new cvflann::Index >(dataset, ip);
index->buildIndex(); //hangs
Is there something I'm missing? Why would there be such a difference in performance here? Any advice would be greatfully received.
Edit:
Compiler:
g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
There are no flags used other than those provided by pkg-config --cflags opencv
and pkg-config --libs opencv
, along with -std=c++11
.
OpenCV version number is 3.4.1.