8

In a 2D pixel array, I need an efficient algorithm that will select p% of pixels that are the most spread out.

This can be done adaptively by selecting points, then repeatedly adjusting the positions of points that are too close together. But this isn't efficient since it would require many iterations and distance calculations.

It doesn't have to be perfect, it just needs to avoid point clusters as much as can be done efficiently.

user20493
  • 5,704
  • 7
  • 34
  • 31
  • 1
    This question is interesting in that it's fairly straightforward to conceptualize the problem, and remarkably difficult to come up with an answer (that will finish in our lifetime). – Beska Aug 19 '09 at 19:55
  • As I said, it doesn't have to be perfect. I'm thinking of using "prebuilt building blocks", n x n regions with preselected points according to the p%, and covering the pixel array with these. – user20493 Aug 19 '09 at 20:08
  • Yep...I was thinking of that...but it occured to me that you might end up with some odd artifacts because of that. – Beska Aug 19 '09 at 20:21
  • You've responded to several people that their solutions, because they're floating point or whatever, may be too slow...which is a perfectly valid concern...but if speed is really the critical issue here, you may want to add that to your original question, so that people know how to focus their efforts. Also any additional information you can think of...size restrictions, limits, etc...these things may help. – Beska Aug 19 '09 at 20:46
  • Minor concern: if you take an iterative approach, you'll end up with a lot of points at the boundary, which may not be what you want. To alleviate this, use periodic boundary conditions in you distance calculations. That is (0.1,0.0) and (0.9,0.0) are separated by distance 0.2 not 0.7 because the world wraps around; likewise in the vertical direction. – dmckee --- ex-moderator kitten Aug 19 '09 at 20:47
  • When you say *2D pixel array* , do you mean a set of 2D points at random geometric locations? Or do you mean a rectangular collection of pixels whose geometric position is reflected by its indices in the array (like screen coordinates)? – Darryl Aug 19 '09 at 20:52
  • Unless you want to tell us *why* you desire this, it is unlikely that we can offer the "best" solution. For some uses the pre-built cells approach is hunky-dory. – dmckee --- ex-moderator kitten Aug 19 '09 at 20:53
  • You can get a quick approximation by places the points at regular intervals. If you want more randomness, you can jitter the points, by randomly moving them a "small" amount, where "small" is relative to the regular spacing. This approach was (is?) commonly used in stochastic sampling ray tracers. – Adrian McCarthy Aug 19 '09 at 22:32
  • "...but if speed is really the critical issue here, you may want to add that to your original question, so that people know how to focus their efforts." I mentioned efficiency three times in the four sentences of the question. – user20493 Aug 20 '09 at 13:15
  • 1
    Just saying Efficiency is ambiguous. We don't know if you're trying to optimize for speed or for memory usage from that. Is your hope to get some kind of single pass algorithm? I have my doubts you could do that and get any results that are remotely good. You're probably going to need to go with an iterative solution, and try to make it efficient iterations. Have you tried implementing a solution yet that you've found is too slow? It might give a frame of reference to know what you've tried so far and in what ways and degrees you're finding it deficient. – matthock Aug 20 '09 at 13:27
  • "When you say 2D pixel array , do you mean a set of 2D points at random geometric locations? Or do you mean a rectangular collection of pixels..." A rectangular collection of pixels. – user20493 Aug 20 '09 at 13:50
  • "Just saying Efficiency is ambiguous..." The iterative approach was described as not efficient because of the calculations required. Space wasn't mentioned. – user20493 Aug 20 '09 at 13:54
  • "Minor concern: if you take an iterative approach, you'll end up with a lot of points at the boundary, which may not be what you want..." Good point. – user20493 Aug 20 '09 at 14:32

10 Answers10

1

You want a Poisson Disk distribution, but it's tricky. Doing a search turns up lots of academic papers about how to do it efficiently: http://people.csail.mit.edu/thouis/JonesPoissonPreprint.pdf

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
1

Thanks to everyone for the answers!

The best solution appears to be using "prebuilt building blocks": n x n arrays with already-selected cells, and covering the pixel array with these.

For example, a 4 x 4 array with 12.5% coverage would be:

0 0 1 0
0 0 0 0
1 0 0 0
0 0 0 0

With 6.3% coverage:

0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0

To get a % coverage between these, just alternate between these blocks according to a running tally of the overall actual % coverage so far. To cover a width that's not a multiple of 4, use some 3 x 3 blocks. To cover a larger area more efficiently, just use larger blocks.

This covers the whole array efficiently with no distance calculations or floating-point arithmetic.

user20493
  • 5,704
  • 7
  • 34
  • 31
1

The "most spread out" selection of pixels is the set whose Delaunay triangulation consists of equilateral triangles. The set of points which leads to this triangulation is found by splitting the pixel array into a set of boxes, where each box is sqrt(3) longer than it is wide. Each box contributes 5 pixels to the final pixel set (one at each corner, plus a center node at the center of the box). The trick is to find how many rows and columns of boxes will give you this 1:sqrt(3) ratio. Without going through the derivation, here's how you get that:

std::vector<PixelIndices> PickPixels(int width, int height, float percent)
{
  int total_pixels = width*height;
  int desired_pixel_count = (int)total_pixels*percent;

  // split the region up into "boxes" with 4 corner nodes and a center node.
  // each box is sqrt(3) times taller than it is wide.

  // calculate how many columns of boxes
  float a = 1.155*height/(float)width;
  float b = .577*height/(float)width + 1;
  float c = 1 - desired_pixel_count;
  int num_columns = (int)((-b + sqrt(b*b -4*a*c))/(2*a));

  // Now calculate how many rows
  int num_rows = .577*height*num_columns/(float)width;

  // total number of pixels
  int actual_pixel_count = 2*num_rows*num_columns + num_rows + num_columns + 1;

  std::cout << "  Total pixels: " << total_pixels << std::endl;
  std::cout << "       Percent: " << percent << std::endl;
  std::cout << "Desired pixels: " << desired_pixel_count << std::endl;
  std::cout << " Actual pixels: " << actual_pixel_count << std::endl;
  std::cout << "   Number Rows: " << num_rows << std::endl;
  std::cout << "Number Columns: " << num_columns << std::endl;

  // Pre-allocate space for the pixels
  std::vector<PixelIndices> results;
  results.reserve(actual_pixel_count);

  // Now get the pixels, my integer math is probably wrong here, didn't test
  //  (didn't even finish, ran out of time)
  for (int row = 0; row <= num_rows; row++)
  {
    int row_index = row*height/num_rows;

    // Top of box
    for (int col = 0; col <= num_columns; col++)
    {
      int col_index = col*width/num_columns;
      results.push_back(PixelIndices(row_index, col_index));
    }

    // Middle of box
    if (row != num_columns)
    {
      for (int col = 0; col < num_columns; col++)
      {
         // I'll leave it to you to get this, I gotta go!
      }
    }
  }

  return results;
}

Instead of using integer division to find the indices, you could speed this up by finding the distance between each point in a row/column and just adding by the offset.

Darryl
  • 5,907
  • 1
  • 25
  • 36
1

You might try Wang tiles:
http://en.wikipedia.org/wiki/Wang_tile
(See the pages linked there to Cohen's paper, and Kopf's paper. I'm a new user so can't post all the links).

These tie together both the prebuilt tiles idea, as well as the evenly distributed requirement usually solved with Poisson-disk patterns. Wang tiles can avoid periodicity effects that are almost surely an issue with more direct use of prebuilt tiles.

1

Quite old, but worth a dig, because answers missed an important method and focused on optimal solutions, which you are not interested in. But the method I suggest may or may not suit your needs however.

You can use quasi random sequences, which are designed for such problems. Most widespread are Sobol sequences, for which you can find canned packages for virtually any language. They are blazingly fast: only bitwise arithmetic.

They most likely will produce some clusters, but this can be avoided by selecting the "seed" to use for the x and y dimensions beforehand, and checking with naked eye.

It depends on what you want to do with the points: if "visual spread out" is important, this may not be what you want. If you want points which "fill the plane" almost evenly, they perfectly do the job. They are especially useful to average something on an image quickly, since it requires less points that with "normal" random generation. Experiment with different dimensions and see.

See also this link for exemples experiments, and pictures.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
0

How about this:

  1. Discover the sum of the distances from each point to each other point. So point A has a sum distance of dist(A, B) + dist(A, C) + dist(A, D) + ...
  2. Sort these summed distances.
  3. Remove points that have the smallest distance sums until you reach your desired percentage.

This may be accurate enough, but if not, you could always replace step 3 with:

"Remove the point that has the smallest sum, and if you need to remove more points to reach your desired percentage, then go back to step 1."

Wait. Now I'm wondering. Are you trying to find the points that are most spread out from a given set of points...or trying, from a given array, to find the points that would be the most spread out? That's totally different...and still really hard.

Beska
  • 12,445
  • 14
  • 77
  • 112
  • Thanks for the answer, but I see two potential problems: 1. Calculation of each point-point distance has an order-n-squared running time, which isn't efficient, and 2. The distance calculations involve floating-point arithmetic, which is relatively slow compared to integer operations. – user20493 Aug 19 '09 at 20:06
  • To answer your question, I'm trying to find the p% points that would be the most spread out, or at least not clustered. – user20493 Aug 19 '09 at 20:11
  • Oh yeah, no question...if you're looking at a large data set, you're asking for some serious computation time. – Beska Aug 19 '09 at 20:19
  • I'm still not 100% sure what you're asking for...do you have a populated array, that has some "pixels" set in it, and by removing some of these pixels you're trying to maximize the spread? Or are you trying to find the most spread out formation of X pixels in an array of size Y,Z? (This answer was to the former possibility...my other answer is to the latter possibility). – Beska Aug 19 '09 at 20:52
0

How about calculating a "density" value for each pixel to start with, based on its proximity to all the other pixels. Then, repeatedly remove the most "dense" pixel until you're below p% remaining in the list.

You'd need to do the distance calculation to determine density between any given two points at most twice. First time would be when you build the original list - each pixel would need to be compared with each other pixel. Second would be when you remove a pixel from the list - you'd have to calculate the removed one against each one remaining in the list. This is to account for density values changing as each pixel is removed - for example, 2 pixels directly next to each other would have a very very high value, but once one is removed, the remaining one might have a very low value.

Some quick pseudocode (note that in this example, higher density areas have a low number)

For I = 0 to MaxPixel
    For J = I to MaxPixel
        PixelDensity[I], PixelDensity[J] += DistanceBetween(Pixels[I], Pixels[J])

While PixelDensity.Count > TargetCount
    RemovePixel = IndexOfSmallest(PixelDensity)
    ForEach I in PixelDensity
        PixelDensity[I] -= DistanceBetween(Pixels[I], Pixels[RemovePixel])
    PixelDensity.Remove(RemovePixel)

If memory is less of a concern than calculation time, you can also store the distance between any two points in a simple 2d array. Also, instead of simple distance, it might be helpful to make the distance calculation exponential - that would avoid something like having two points almost on top of each other, but far away from everything else, and having both make it through.

matthock
  • 629
  • 1
  • 4
  • 15
  • It sounds like it might work, but the iterations and floating-point calculations might make it slow. – user20493 Aug 19 '09 at 20:23
  • Yeah, I'm not really sure what size data set you're dealing with. Is there possibly a fixed precision 'filter' type operation you can run when deciding what points to compare? For example, you can say "If these points are over Z distance apart on either the X or Y axis, they're far enough apart to ignore and not do the floating point calculations". Could save you some computation time with next to no effect on accuracy. – matthock Aug 19 '09 at 20:32
0

An iterative mine-sweeper flood-fill approach would be straightforward to visualise.

  1. For each cell, find the two nearest points and record the product of those two distances.
  2. Those cells with the highest products are those that are attached to the furthest points.
Will
  • 73,905
  • 40
  • 169
  • 246
0

Ooh! How about this!

(In a very hand-wavey way, since I don't know whether your matrix is square or whatever...I'll assume it is.)

Say you have a 1000x1000 array that you want to place 47 point into (I'm picking 47 so that it is an unusual number that won't fit "nicely").

You take the ceil(sqrt(47))...that will give you a value (7). So we make a 7x7 square, fill it with 47 pixels (some are blank), and imagine placing that in the corner of the array.

Now, translate each of those pixels to a new location, based on where they are in the small (7x7) array to the large array (1000x1000). A simple equation should do this for you...for the X coordinate, for example:

xBigArrayIndex = xSmallArrayIndex * 1000 / 7;

Then your pixels will be super spread out! And it's nice and fast.

The only downside is that this only works perfectly when your square is ideally spaced to begin with...if you fill it naively (starting at the top left, going across, etc.) you will end up with with a slightly sub-ideal spread...since the translated pixels won't quite reach the bottom right of the large array. But maybe this is good enough? And if not, perhaps it's a smaller subset of the problem that is easier to deal with?

Beska
  • 12,445
  • 14
  • 77
  • 112
  • It's efficient. But for higher densities and non-square arrays, the points would form clusters. The array I'm using is in the shape of a long narrow strip, roughly 30 pixels wide (this can change) and thousands of pixels long. (This is the best answer so far...) – user20493 Aug 19 '09 at 20:56
0

You can Use convex hull algorithm, and exclude points which this algorithm will compute and repeat it as long as it get to your p% criteria, or

execute convex hull algorithm steps, check for points included in hull and inside of it to meet the criteria 100% - p%

some demos of convex hull are here http://www.cs.unc.edu/~snoeyink/demos/

and here you got some more info http://softsurfer.com/Archive/algorithm_0109/algorithm_0109.htm

MoreThanChaos
  • 2,054
  • 5
  • 20
  • 40