0

Given two points P,Q and a delta, I defined the equivalence relation ~=, where P ~= Q if EuclideanDistance(P,Q) <= delta. Now, given a set S of n points, in the example S = (A, B, C, D, E, F) and n = 6 (the fact points are actually endpoints of segments is negligible), is there an algorithm that has complexity better than O(n^2) in the average case to find a partition of the set (the representative element of the subsets is unimportant)?

Attempts to find theoretical definitions of this problem were unsuccessful so far: k-means clustering, nearest neighbor search and others seems to me different problems. The picture shows what I need to do in my application.

Any hint? Thanks

alt text

EDIT: while the actual problem (cluster near points given some kind of invariant) should be solvable in better better than O(n^2) in the average case, there's a serious flaw in my problem definition: =~ is not a equivalence relation because of the simple fact it doesn't respect the transitive property. I think this is the main reason this problem is not easy to solve and need advanced techiques. Will post very soon my actual solution: should work when near points all satisfy the =~ as defined. Can fail when poles apart points doesn't respect the relation but they are in relation with the center of gravity of clustered points. It works well with my input data space, may not with yours. Do anyone know a full formal tratment of this problem (with solution)?

ceztko
  • 14,736
  • 5
  • 58
  • 73
  • Will there be any conflicts, i.e. could there be points `G` and `H` that are within `delta` of `A` and `F`? – Jonas Dec 09 '10 at 18:28
  • Not in the example, but there may be. At certain conditions, the proposed solutions should work. Unfortunately the problem is much more complicated than I originally thought. Essentially, I need to cluster/classify/index *near* points. How to formally write a condition that summarize the non formal concept *"near points"*? Not easy to do, and unless this is done I can't think of better solutions, and even knowing it I haven't time left for this problem. In the mean time you could like the free code in this page. :) – ceztko Dec 09 '10 at 22:55

5 Answers5

1

One way to restate the problem is as follows: given a set of n 2D points, for each point p find the set of points that are contained with the circle of diameter delta centred at p.

A naive linear search gives the O(n^2) algorithm you allude to.

It seems to me that this is the best one can do in the worst case. When all points in the set are contained within a circle of diameter <= delta, each of n queries would have to return O(n) points, giving an O(n^2) overall complexity.

However, one should be able to do better on more reasonable datasets. Take a look at this (esp. the section on space partitioning) and KD-trees. The latter should give you a sub-O(n^2) algorithm in reasonable cases.

There might be a different way of looking at the problem, one that would give better complexity; I can't think of anything off the top of my head.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Hmmm...ok, maybe I'm getting the idea. 1) I need to construct a kd-tree with a proper heuristic 2) I need to traverse the tree looking for neighbors. Are you sure that traversing a kd-tree I'll be able to get **all** neighbors of a point within a delta in O(logN) in the average case? Thanks – ceztko Dec 05 '10 at 23:54
  • I'm making progresses implementing the KdTree. Will set your answer as the solution when I'm done. In the mean time I like your answer :) – ceztko Dec 07 '10 at 17:25
  • While I coded a solution that works in my case, I discovered my formalization of the problem was wrong (=~ is **not** a equivalence relation). This was true even with my previous naive algorithm. However, now I have O(NlogN) algorithm in average case, much better :). Thanks for hints. – ceztko Dec 09 '10 at 22:45
0

Definitely a problem for Quadtree.

You could also try sorting on each coordonate and playing with these two lists (sorting is n*log(n), and you can check only the points that satisfies dx <= delta && dy <= delta. Also, you could put them in a sorted list with two levels of pointers: one for parsing on OX and another for OY.

ruslik
  • 14,714
  • 1
  • 39
  • 40
  • Quadtree seems used in spatial indexing (that, if I get it right, should be the theoretical definition of my problem). I tried second approach time ago without success. In both cases, it seems I lack knowledge background and/or pseudocode to code 100% working solution myself in a real world application. Will try kd-tree and nns as suggested by aix for now, anyway thanks! :) – ceztko Dec 06 '10 at 00:06
0

For each point, calculate the distance D(n) from the origin, this is an O(n) operation.

Use a O(n^2) algorithm to find matches where D(a-b) < delta, skipping D(a)-D(b) > delta.

The result, on average, must be better than O(n^2) due to the (hopefully large) number skipped.

smirkingman
  • 6,167
  • 4
  • 34
  • 47
  • I think this is a Θ(n^2) algorithm (Theta). I mean this will have always n^2 iterations, saving only a fraction of constant time. Or I got it wrong? Anyway, I'm coming with a C# kd-tree implementation very soon. – ceztko Dec 07 '10 at 16:42
  • Well, try it and see. My bet is that there will be **many** eliminated, reducing you O(n^2) component significantly – smirkingman Dec 07 '10 at 20:16
0

This is a C# KdTree implementation that should solve the "Find all neighbors of a point P within a delta". It makes heavy use of functional programming techniques (yes, I love Python). It's tested but I still have doubts doubts in understanding _TreeFindNearest(). The code (or pseudo code) to solve the problem "Partition a set of n points given a ~= relation in better than O(n^2) in the average case" is posted in another answer.

/*
Stripped C# 2.0 port of ``kdtree'', a library for working with kd-trees.
Copyright (C) 2007-2009 John Tsiombikas <nuclear@siggraph.org>
Copyright (C) 2010 Francesco Pretto <ceztko@gmail.com>

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.
3. The name of the author may not be used to endorse or promote products
   derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGE.
*/

using System;
using System.Collections.Generic;
using System.Text;

namespace ITR.Data.NET
{
    public class KdTree<T>
    {
        #region Fields

        private Node _Root;
        private int _Count;
        private int _Dimension;
        private CoordinateGetter<T>[] _GetCoordinate;

        #endregion // Fields

        #region Constructors

        public KdTree(params CoordinateGetter<T>[] coordinateGetters)
        {
            _Dimension = coordinateGetters.Length;
            _GetCoordinate = coordinateGetters;
        }

        #endregion // Constructors

        #region Public methods

        public void Insert(T location)
        {
            _TreeInsert(ref _Root, 0, location);
            _Count++;
        }

        public void InsertAll(IEnumerable<T> locations)
        {
            foreach (T location in locations)
                Insert(location);
        }

        public IEnumerable<T> FindNeighborsRange(T location, double range)
        {
            return _TreeFindNeighborsRange(_Root, 0, location, range);
        }

        #endregion // Public methods

        #region Tree traversal

        private void _TreeInsert(ref Node current, int currentPlane, T location)
        {
            if (current == null)
            {
                current = new Node(location);
                return;
            }

            int nextPlane = (currentPlane + 1) % _Dimension;

            if (_GetCoordinate[currentPlane](location) <
                    _GetCoordinate[currentPlane](current.Location))
                _TreeInsert(ref current._Left, nextPlane, location);
            else
                _TreeInsert(ref current._Right, nextPlane, location);
        }

        private IEnumerable<T> _TreeFindNeighborsRange(Node current, int currentPlane,
            T referenceLocation, double range)
        {
            if (current == null)
                yield break;

            double squaredDistance = 0;
            for (int it = 0; it < _Dimension; it++)
            {
                double referenceCoordinate = _GetCoordinate[it](referenceLocation);
                double currentCoordinate = _GetCoordinate[it](current.Location);
                squaredDistance +=
                    (referenceCoordinate - currentCoordinate)
                    * (referenceCoordinate - currentCoordinate);
            }

            if (squaredDistance <= range * range)
                yield return current.Location;

            double coordinateRelativeDistance =
                _GetCoordinate[currentPlane](referenceLocation)
                    - _GetCoordinate[currentPlane](current.Location);
            Direction nextDirection = coordinateRelativeDistance <= 0.0
                ? Direction.LEFT : Direction.RIGHT;
            int nextPlane = (currentPlane + 1) % _Dimension;
            IEnumerable<T> subTreeNeighbors =
                _TreeFindNeighborsRange(current[nextDirection], nextPlane,
                    referenceLocation, range);
            foreach (T location in subTreeNeighbors)
                yield return location;

            if (Math.Abs(coordinateRelativeDistance) <= range)
            {
                subTreeNeighbors =
                    _TreeFindNeighborsRange(current.GetOtherChild(nextDirection),
                        nextPlane, referenceLocation, range);
                foreach (T location in subTreeNeighbors)
                    yield return location;
            }
        }

        #endregion // Tree traversal

        #region Node class

        public class Node
        {
            #region Fields

            private T _Location;
            internal Node _Left;
            internal Node _Right;

            #endregion // Fields

            #region Constructors

            internal Node(T nodeValue)
            {
                _Location = nodeValue;
                _Left = null;
                _Right = null;
            }

            #endregion // Contructors

            #region Children Indexers

            public Node this[Direction direction]
            {
                get { return direction == Direction.LEFT ? _Left : Right; }
            }

            public Node GetOtherChild(Direction direction)
            {
                return direction == Direction.LEFT ? _Right : _Left;
            }

            #endregion // Children Indexers

            #region Properties

            public T Location
            {
                get { return _Location; }
            }

            public Node Left
            {
                get { return _Left; }
            }

            public Node Right
            {
                get { return _Right; }
            }

            #endregion // Properties
        }

        #endregion // Node class

        #region Properties

        public int Count
        {
            get { return _Count; }
            set { _Count = value; }
        }

        public Node Root
        {
            get { return _Root; }
            set { _Root = value; }
        }

        #endregion // Properties
    }

    #region Enums, delegates

    public enum Direction
    {
        LEFT = 0,
        RIGHT
    }

    public delegate double CoordinateGetter<T>(T location);

    #endregion // Enums, delegates
}
ceztko
  • 14,736
  • 5
  • 58
  • 73
0

The following C# method, together with KdTree class, Join() (enumerate all collections passed as argument) and Shuffled() (returns a shuffled version of the passed collection) methods solve the problem of my question. There may be some flawed cases (read EDITs in the question) when referenceVectors are the same vectors as vectorsToRelocate, as I do in my problem.

public static Dictionary<Vector2D, Vector2D> FindRelocationMap(
    IEnumerable<Vector2D> referenceVectors,
    IEnumerable<Vector2D> vectorsToRelocate)
{
    Dictionary<Vector2D, Vector2D> ret = new Dictionary<Vector2D, Vector2D>();

    // Preliminary filling
    IEnumerable<Vector2D> allVectors =
        Utils.Join(referenceVectors, vectorsToRelocate);
    foreach (Vector2D vector in allVectors)
        ret[vector] = vector;

    KdTree<Vector2D> kdTree = new KdTree<Vector2D>(
        delegate(Vector2D vector) { return vector.X; },
        delegate(Vector2D vector) { return vector.Y; });
    kdTree.InsertAll(Utils.Shuffled(ret.Keys));

    HashSet<Vector2D> relocatedVectors = new HashSet<Vector2D>();
    foreach (Vector2D vector in referenceVectors)
    {
        if (relocatedVectors.Contains(vector))
            continue;

        relocatedVectors.Add(vector);

        IEnumerable<Vector2D> neighbors =
            kdTree.FindNeighborsRange(vector, Tolerances.EUCLID_DIST_TOLERANCE);

        foreach (Vector2D neighbor in neighbors)
        {
            ret[neighbor] = vector;
            relocatedVectors.Add(neighbor);
        }
    }

    return ret;
}
ceztko
  • 14,736
  • 5
  • 58
  • 73