Algorithm balanced K-D tree with O(kn log n)

Question

I tried to implement a balanced K-D tree with O(kn log n), I used presorted K arrays (sorted arrays for each index) to get O(kn log n), and median to get balanced tree.

The problem I faced was that mostly the median value at some level ,for example the median for x axis, maybe chosen again at another subsequent level, for example for y axis.

I tried to solve this by dividing y sorted array to two arrays by using chosen x value as a pivot, but this way wouldn't yield a balanced tree.

Any idea how to get K-D balanced tree with O(kn log n)?

EDIT

Quoted from wiki https://en.wikipedia.org/wiki/K-d_tree

Alternative algorithms for building a balanced k-d tree presort the data prior to building the tree. They then maintain the order of the presort during tree construction and hence eliminate the costly step of finding the median at each level of subdivision. Two such algorithms build a balanced k-d tree to sort triangles in order to improve the execution time of ray tracing for three-dimensional computer graphics. These algorithms presort n triangles prior to building the k-d tree, then build the tree in O(n log n) time in the best case.[5][6] An algorithm that builds a balanced k-d tree to sort points has a worst-case complexity of O(kn log n).[7] This algorithm presorts n points in each of k dimensions using an O(n log n) sort such as Heapsort or Mergesort prior to building the tree. It then maintains the order of these k presorts during tree construction and thereby avoids finding the median at each level of subdivision.

Anyone could provide such algorithm provided above?

EDIT

The came up with a way but it doesn't work if there is any duplicate value of specific axis for the median.

For example

x1 = [ (0, 7), (1, 3), (3, 0), (3, 1), (6, 2) ] y1 = [ (3, 0), (3, 1), (6, 2), (1, 3), (0, 7) ]

The median of x-axis is 3. So when we want to split the array y11 and y12 we have to use > and < to distribute y array left and right considering pivot as delimiter.

there is no guarantee one of them is correct if the median a on specific axis is duplicated

Consider the partition on x axis, and there is no problem on x1 array following completion of above example of first step partition:

median=(3,0)
The pivot = 3 // is it's the median of x axis
y11[],y12[] 
for(i = 0 ; i < x1.size;i++)
  if(y1[i].getX()<pivot)
    y11.add(y1[i])
  else 
    if(y1[i].getX()>pivot)
     y12.add(y1[i])

This will result y11 = [(2 ,1) , (1, 3), (0, 7) ] y12 = [ (6,2) ]

Any idea how to handle such case? Or is there any another presorting kd-tree presorting algorithm O(kn log n) ?

This may not occur. After the first subdivision step, the left and right subtrees are processed separately and do not contain the initial median. Presorting on `y` is useless unless you are able to partition the set sorted on `x` without losing the `y` order. — , Feb 05 '16 at 14:20
(Please check tag _balanced-data-distributor_: is this about Microsoft SSIS BDD?) — greybeard, Feb 08 '16 at 11:54
(This may be just explicating/extending [Anony-Mousse's answer](http://stackoverflow.com/a/35229291/3789665).) I might try keeping the data in two(/k)(times 2 to the power of the current split/tree level) doubly linked lists after the initial sort(s): splitting in one coordinate/dimension _should_ not perturb any other. — greybeard, Feb 08 '16 at 12:03
Could you please give a simple example, that handles a duplicate median value as well please. — Aladdin, Feb 08 '16 at 12:20
(Commenters do not get notified unless you explicitly address them using @ (see help) - I just happened to check for more comments.) Will try, might post on CR instead of SO (would place a link). — greybeard, Feb 08 '16 at 13:19
(My attempt _does_ get messy after the easy parts (pre-sorting & setting up lists) - I appreciate that the bounty is a HUGE cut into your reputation score. You didn't indicate implementation languages welcome (if any) - Java OK?) — greybeard, Feb 08 '16 at 14:17
I guess I'm still not quite getting what you are asking: I'll try to address the points top to bottom, more than one comment or not. `median value at some level ,for example the median for x axis, maybe chosen again at another subsequent level` - _how_ would that be a problem? `Any idea how to get K-D balanced tree with O(kn log n)?` - the usual - for balanced as in _depth will be O(log n)_. Perfectly balanced? Not really, what would be the advantage? (ToBeContinued) — greybeard, Feb 08 '16 at 20:09
(I find one thing in the quote from wikipedia worded particularly unfortunately: _avoids finding the median at each level of subdivision_. It avoids to invest more than constant time.) `Anyone could provide such algorithm provided above` (_described_ above?) This seems to imply "the usual" doesn't (build a balanced tree, in particular). `came up with a way but it doesn't work if there is any duplicate value` & `there is no guarantee one of [> and <] is correct [in presence of duplicates]` - this need a definition of _correct_ or one of _incorrect_. (TBC) — greybeard, Feb 08 '16 at 20:35
My take: correct is any element such that no other element makes the smaller part end up bigger. Strictly, this involves trying both in presence of duplicates. `Consider the partition on x axis, and there is no problem on x1 array [at] completion [of] first step partition` & `Any idea how to handle such case?` If there is no problem, sit back and relax. Really - choosing < as the operator(or 2 as the pivotal x-value) would yield y11 = [(1, 3), (0, 7)], y12 = [(3, 0), (3, 1), (6, 2)], but _both_ would yield height O(log n). — greybeard, Feb 08 '16 at 20:46

Has QUIT--Anony-Mousse · Answer 1 · 2016-02-08T21:52:32.027

When splitting the data, you need to retain the sort order.

E.g. using data (x,y) we build

x1 = [ (0, 7), (1, 3), (3, 0), (4, 2), (6, 1) ]
y1 = [ (3, 0), (6, 1), (3, 2), (1, 3), (0, 7) ]

If we split at x now, we need to filter both sets by the record at x=3,y=0.

I.e. split both lists, removing (3,0), all items with x<3 go to the first list each, all with x>3 go to the second (order unchanged):

x1 -> filter to  x11 = [ (0, 7), (1, 3) ]  x12 = [ (4, 2), (6, 1) ]
y1 -> filter to  y11 = [ (1, 3), (0, 7) ]  y12 = [ (6, 1), (4, 2) ]

The point is to filter each sorted list by the x values, while keeping the sorting order (so this is in O(n*k) in each of O(log n) levels). If you use only x1, and reconstruct y11 and y12 from x1, then you would need to sort again. By necessity, it is the same as if you sort once by x, once by y. Except that we did not sort again, only select.

I do not think this is much better in practise. Sorting is cheaper than the extra memory.

score 1 · Answer 2 · edited May 23 '17 at 11:59

To elaborate on my comment (and Anony-Mousse's answer, probably):

The key idea with pre-sorting in constructing KD-trees is to keep the order during split. The overhead looks quite high, a comparative benchmark with re-sorting (and k-select) seems in order.
Some proof-of principle Java source code:

package net.*.coder.greybeard.sandbox;

import java.util.Arrays;
import java.util.Comparator;
import java.util.LinkedList;

/** finger exercise pre-sorting & split for KD-tree construction
 *  (re. https://stackoverflow.com/q/35225509/3789665) */
public class KDPreSort {
 /** K-dimensional key, dimensions fixed
  *   by number of coordinates in construction */
    static class KKey {
        public static KKey[] NONE = {};
        final Comparable[]coordinates;
        public KKey(Comparable ...coordinates) {
            this.coordinates = coordinates;
        }
    /** @return {@code Comparator<KKey>} for coordinate {@code n}*/
        static Comparator<KKey> comparator(int n) { // could be cached
            return new Comparator<KDPreSort.KKey>() { @Override
                    public int compare(KKey l, KKey r) {
                        return l.coordinates[n]
                            .compareTo(r.coordinates[n]);
                    }
                };
        }
        @Override
        public String toString() {
            StringBuilder sb = new StringBuilder(
                Arrays.deepToString(coordinates));
            sb.setCharAt(0, '(');
            sb.setCharAt(sb.length()-1, ')');
            return sb.toString();
        }
    }

 // static boolean trimLists = true; // introduced when ArrayList was used in interface

/** @return two arrays of {@code KKey}s: comparing smaller than
 *    or equal to {@code pivot} (according to {@code comp)},
 *    and greater than pivot -
 *    in the same order as in {@code keys}. */
    static KKey[][] split(KKey[] keys, KKey pivot, Comparator<KKey> comp) {
        int length = keys.length;
        ArrayList<KKey>
            se = new ArrayList<>(length),
            g = new ArrayList<>(length);
        for (KKey k: keys) {
        // pick List to add to
            List<KKey>d = comp.compare(k, pivot) <= 0 ? se : g;
            d.add(k);
        }
//      if (trimLists) { se.trimToSize(); g.trimToSize(); }
        return new KKey[][] { se.toArray(KKey.NONE), g.toArray(KKey.NONE) };
    }
 /** @return two arrays of <em>k</em> arrays of {@code KKey}s:
  *  comparing smaller than or equal to {@code pivot}
  *   (according to {@code comp)}, and greater than pivot,
  *  in the same order as in {@code keysByCoordinate}. */
    static KKey[][][]
        splits(KKey[][] keysByCoordinate, KKey pivot, Comparator<KKey> comp) {
        final int length = keysByCoordinate.length;
        KKey[][]
            se = new KKey[length][],
            g = new KKey[length][],
            splits;
        for (int i = 0 ; i < length ; i++) {
            splits = split(keysByCoordinate[i], pivot, comp);
            se[i] = splits[0];
            g[i] = splits[1];
        }
        return new KKey[][][] { se, g };
    }
 // demo
    public static void main(String[] args) {
    // from https://stackoverflow.com/q/17021379/3789665
        Integer [][]coPairs = {// {0, 7}, {1, 3}, {3, 0}, {3, 1}, {6, 2},
                {12, 21}, {13, 27}, {19, 5}, {39, 5}, {49, 63}, {43, 45}, {41, 22}, {27, 7}, {20, 12}, {32, 11}, {24, 56},
            };
        KKey[] someKeys = new KKey[coPairs.length];
        for (int i = 0; i < coPairs.length; i++) {
            someKeys[i] = new KKey(coPairs[i]);
        }
    //presort
        Arrays.sort(someKeys, KKey.comparator(0));
        List<KKey> x = new ArrayList<>(Arrays.asList(someKeys));
        System.out.println("by x: " + x);
        KKey pivot = someKeys[someKeys.length/2];
        Arrays.sort(someKeys, KKey.comparator(1));
        System.out.println("by y: " + Arrays.deepToString(someKeys));
    // split by x
        KKey[][] allOrdered = new KKey[][] { x.toArray(KKey.NONE), someKeys },
            xSplits[] = splits(allOrdered, pivot, KKey.comparator(0));
        for (KKey[][] c: xSplits)
            System.out.println("split by x of " + pivot + ": "
                + Arrays.deepToString(c));
    // split "higher x" by y
        pivot = xSplits[1][1][xSplits[1][1].length/2];
        KKey[][] ySplits[] = splits(xSplits[1], pivot, KKey.comparator(1));
        for (KKey[][] c: ySplits)
            System.out.println("split by y of " + pivot + ": "
                + Arrays.deepToString(c));
    }
}

(Didn't find a suitable answer/implementation on SE while not investing too much energy. The output was non-convincing with your example, with the longer one, I had to re-format it to believe it.
The code looks ugly, in all likelihood because it is: if so inclined re-read about the licence of code posted on SE, an visit Code Review.) (Consider that there's voting, accepting, and awarding bounties, and re-visit Anony-Mousse's answer.)

Algorithm balanced K-D tree with O(kn log n)

2 Answers2