6

I am trying to compute the average cell size on the following set of points, as seen on the picture: grid. The picture was generated using gnuplot:

gnuplot> plot "debug.dat" using 1:2

The points are almost aligned on a rectangular grid, but not quite. There seems to be a bias (jitter?) of say 10-15% along either X or Y. How would one compute efficiently a proper partition in tiles so that there is virtually only one point per tile, size would be expressed as (tilex, tiley). I use the word virtually since the 10-15% bias may have moved a point in another adjacent tile.

Just for reference, I have manually sorted (hopefully correct) and extracted the first 10 points:

 -133920,33480
 -132480,33476
 -131044,33472
 -129602,33467
 -128162,33463
 -139679,34576
 -138239,34572
 -136799,34568
 -135359,34564
 -133925,34562

Just for clarification, a valid tile as per the above description would be (1435,1060), but I am really looking for a quick automated way.

malat
  • 12,152
  • 13
  • 89
  • 158
  • `1. Find the delaunay triangulation. 2. Remove the diagonal lines.` What remains is essentially what you want or at least will be rather helful. Takes O(N log N). – Nuclearman Dec 13 '14 at 01:04
  • What diagonal lines? – NaCl Dec 13 '14 at 12:58
  • 1
    Triangulation of an approximate grid creates lines that are approximately horizontal, vertical and diagonal. The horizontal and vertical ones are clearly useful in this case, but the diagonal ones are probably not and thus are best removed. The result is a grid graph that lets you easily find the next closest point directly to the left, right, up or down of a given point. This data structure should be sufficient to do whatever is needed in O(N). For example, you can find all points in a column by walking up and down from a point or a row by walking left and right. – Nuclearman Dec 13 '14 at 17:07
  • Do tiles have to be rectangular? – mleko Dec 14 '14 at 08:38
  • @mleko yes tiles have to be rectangular, that's the whole point. – malat Dec 15 '14 at 09:14
  • @Nuclearman delaunay triangulation is a O(n^2) operation, as per http://cs.stackexchange.com/questions/2400/brute-force-delaunay-triangulation-algorithm-complexity – malat Dec 15 '14 at 09:17
  • That's for a specific approach, which *only* uses edge flips, and as noted in the first sentence from the link you gave, it's considered a brute force way of doing it. There are more efficient ways, as noted by the [Wikipedia article](http://en.wikipedia.org/wiki/Delaunay_triangulation#Algorithms). Wikipedia is a rather good starting point for learning about specific algorithms/data structures. – Nuclearman Dec 15 '14 at 17:40
  • A lot of questions. Do the tiles need to touch eachother (more like a grid) Drawing a couple of tiles on top of the image would help. If so, then I'll write you the java :) – bvdb Dec 17 '14 at 21:17
  • @dvdb: Yes, they need to touch each other. – NaCl Dec 18 '14 at 22:38

1 Answers1

1

Let's do this for X coordinate only:

1) sort the X coordinates

2) look at deltas between two subsequent X coordinates. These delta will fall into two categories - either they correspond to spaces between two columns, or to spaces between crosses within the same column. Your goal is to find a threshold that will separate the long spaces from the short ones. This can be done by finding a threshold that separates the deltas into two groups whose means are the furthest apart (I think)

3) once you have the threshold, separate points into columns. A columns starts and ends with a delta corresponding to the threshold you measured previously

4) calculate average position of each detected column

5) take deltas between subsequent columns. Now, the problem is that you may get a stray point that would break your columns. Use a median to get the strays out.

6) You should have a robust estimate of your gridX

Example, using your data, looking at axis X:

-133920 -132480 -131044 -129602 -128162 -139679 -138239 -136799 -135359 -133925

Sorted + deltas:

5 1434 1436 1440 1440 1440 1440 1440 1442

Here you can see that there is a very obvious threshold between small (5) and large (1434 and up) delta. 1434 will define your space here

Split the points into columns:

-139679|-138239|-136799|-135359|-133925 -133920|-132480|-131044|-129602|-128162
       1440   1440    1440    1434      5    1440    1436    1442    1440

Almost all points are alone, except the two -133925 -133920.

The average grid line positions are:

-139679 -138239 -136799 -135359 -133922.5 -132480 -131044 -129602 -128162

Sorted deltas:

1436.0 1436.5 1440.0 1440.0 1440.0 1440.0 1442.0 1442.5

Median:

1440

Which is the correct answer for your SMALL data set, IMHO.

Roman Zenka
  • 3,514
  • 3
  • 31
  • 36