Building a k-d tree using MapReduce?

Question

I am trying to build the KD tree(independent) for image features. I have extracted the image features,the feature contains suppose 1000 float values.

Using map-reduce to distribute the images among the nodes of the cluster according to classification(eg, cat,dog,guns)ie. each node will contain the bunch of the similar images & then build KD tree of the images on each node. I am confused about how the tree can be built.

So how can I build the KD tree using map-reduce? Each node will contain the tree,right? What could be the logic to distribute the images? While building the KD-tree, on what basis should I add image-feature vectors in tree(ie left or right child)?

Any help is appreciated.Thanks in advance.

Not sure if I follow exactly what you describing. Are you sure you need to use map-reduce/hadoop for this task? If you want very granular control of what data goes to what node, some sort of sharding mechanism might be better. Furthermore you'd have to make your kd tree query-able and serve data out of HDFS which doesn't sound fun. — Girish Rao, Jun 11 '12 at 16:46
hey,thanks....see, what i have to do is, I have large set of images & i have extracted features of images.each vector contain 1000s float values. So how should i build the kd tree based on theses values so that it will help me during searching. I need to distribute the similar images set to nodes using map-reduce that means distribute the images among nodes. how could i achieve that? — Amnesiac, Jun 11 '12 at 16:52
Are you classifying the images based on similarity *before* the mapreduce component? So you know before the mapreduce component which images should be grouped together. — Girish Rao, Jun 11 '12 at 16:58
yeah....images are classified... i just need to distibute the images among nodes & build the kd tree. — Amnesiac, Jun 11 '12 at 17:00
@GirishRao...But how to build the tree on each node ? what logic could be used to decide left or right child? — Amnesiac, Jun 11 '12 at 17:07
I'm not 100% sure, but I feel like map-reduce is overkill. Why couldn't you use a kdtree library (like http://code.google.com/p/kdtree/) and run your inserts against it? If you've already classified each image, assign each image a score. Then use the kdtree library to construct the tree based on this score feature. The library will do the work for you based on the score feature. — Girish Rao, Jun 11 '12 at 17:12
@GirishRao & Adrian...thanks for help...I have checked out wiki already..Now I am very clear about my question..Check out this question ..i have modified my doubt little bit here. http://stackoverflow.com/questions/11009714/building-distributed-kd-tree-using-map-reduce .. — Amnesiac, Jun 13 '12 at 07:04
check this link-- http://stackoverflow.com/questions/11009714/building-distributed-kd-tree-using-map-reduce — Amnesiac, Jun 13 '12 at 07:05
You have a paper that implements kd tree with mapreduce: https://arxiv.org/pdf/1512.06389.pdf — , Feb 07 '19 at 17:11

rolve · Accepted Answer · 2012-06-11T23:01:25.860

2

I don't think that a k-d-tree is the right thing for your data. Here's what Wikipedia says about it:

k-d trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces. As a general rule, if the dimensionality is k, the number of points in the data, N, should be N >> 2^k. Otherwise, when k-d trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search, and approximate nearest-neighbour methods should be used instead.

Your feature vectors have dimensionality 1000, which means that you should have around 10^300 images, which is quite unlikely.

I suggest that you look at Locality-sensitive hashing, which is one of the mentioned approximate nearest-neighbor searches for high-dimensional data.

Since Wikipedia is not always the best place to learn something complicated, I suggest you take a look at the respective lecture slides of the Data Mining course of ETH Zurich instead. It just so happens that I am taking this course in the current semester.

edited Jun 11 '12 at 23:01

answered Jun 11 '12 at 22:26

rolve

10,083
4
55
75

@rolve.... thanks,,,thats very helpful... But just for information, how to build a kd tree in general for k dimensions?? How the nodes are added in tree? for adding a node,on what basis the comparison is made? – Amnesiac Jun 12 '12 at 04:05
Have you checked the Wikipedia page (http://en.wikipedia.org/wiki/K-d_tree)? There are code examples and textual descriptions that should be easy to follow. – rolve Jun 12 '12 at 07:17
@rolve... hey chek out this question ..i have modified my doubt litle bit here. http://stackoverflow.com/questions/11009714/building-distributed-kd-tree-using-map-reduce .. – Amnesiac Jun 13 '12 at 07:02
check this link-- http://stackoverflow.com/questions/11009714/building-distributed-kd-tree-using-map-reduce – Amnesiac Jun 13 '12 at 07:06

Building a k-d tree using MapReduce?

1 Answers1

Linked