0

I am looking at implementing a quad tree and r-tree data structures to test some ideas on dealing with distribution of 2 dimension points. My question is, how do these algorithms handle duplicate points? Or what are some techniques for handling duplicate points?

Stefan Orr
  • 191
  • 1
  • 11

2 Answers2

1

Most likely you can ignore/delete duplicate points.

Micromega
  • 12,486
  • 7
  • 35
  • 72
  • I am unable to ignore or delete the points. I am wanting to find out information about the distribution of all points so knowing that there are duplicates is important. However I'm not sure how to handle the partitioning in this case. I'm thinking I will just have to make a class point which also has a count that I increment if a duplicate point is created and ignore the partitioning part. – Stefan Orr Sep 16 '15 at 00:05
0

QuadTrees require care. The naive implementation would try to continue splitting until you have fewer than a maximum number of elements m (Default m=1). If you have m+1 duplicates, it would then run into an infinite loop. Thus, you need to deteft and handle duplicate points.

R-trees are nicer. It is perfectly valid to have overlapping pages. Thus, even when a page consisting only of duplicates overflows, you can split it. R-tree splits always have to split the data into two equi-sized pages.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Yes, R-tree should work out of the box for duplicated points (e.g. Boost.Geometry R-tree allows duplicates). The "classic" balanced KD-tree (storing node-points in a heap-like array and using points' coordinates as dividing planes' cooridnates) should also work out of the box for duplicated points. – Adam Wulkiewicz Oct 21 '15 at 19:45