2

I have an array (M x N) of air pressure data (gridded model data). There's also two arrays (also M x N) for latitudes and longitudes. To build a GeoJSON of isobars (surfaces of equal pressure) I need to find clusters of pressure values with given step (1 Pa, 0.5 Pa). In general I was thinking to solve it like that:

  1. Build a list of objects: [{ lat, lon, pressure },..] to keep lat and lon data linked to a pressure;
  2. Sort objects by pressure;
  3. For each object in list: compare its pressure value and move to a dedicated list;
  4. Create GeoJSON features.

But step 3 is not yet clear to me: how to find clusters in a smart way? Which algorithm should I look for? Can I do that with scipy.cluster package?

bolkhovsky
  • 120
  • 1
  • 8
  • Is your range of isobar grid fixed? Then it's as easy as `isobar_bucket_no = trunc(pressure / 0.5)`, where `0.5` is your grid step. You don't even need sorting. If you need to calculate the range dynamically, find min and max pressure, then find an appropriate grid step so that the number of isobars is reasonable. – 9000 Sep 08 '14 at 16:50

1 Answers1

1

I don't think you are looking for cluster at all.

Apparently the isobar ranges are given. So split your data set on them; you do not need to sort for this - just find the minimum and maximum to get all buckets, then select data according to each bucket separately. This breaks the problem down nicely into smaller chunks.

I guess your problem is largely a visualization one. You want to display areas of similar pressure instead of points, right?

Instead of looking at statistical methods such as least-squares optimization (k-means), which require you to predefine the parameter k, consider looking at visualization techniques such as Alpha Shapes (closely related to convex hulls, but they also allow non-convex shapes). If you compute alpha shapes for each of your pressure domains, you should get a nice visualization of these regions.

If you insist on using clustering, have a look at DBSCAN. Mostly for the reason that it allows non-convex shaped clusters, and that it can work with latitude+longitude (k-means doesn't). But even HAC may be able to give you good results, since you can define your cut threshold based on your data resolution (e.g. merge any points - in the same pressure bucket - if they are less than 1km apart).

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thank you, I got the idea. You are right: it is more about visualization, that's my fault. But it looks like Concave Hull is something that will solve my problem so I am going to continue with that direction. – bolkhovsky Sep 09 '14 at 10:51