3

I'm collecting location information from different sources and storing everything in a MongoDb collection. Apart from point locations with a single lat/lng coordinates, I'm also storing areas.

Now, one data gives me the location information as GeometryCollection but with all elements being Polygons. Another data source gives me the location as MultiPolygon. While I'm actually considering have a collection for each data source, I'm wondering which approach is better in the whole.

GeometryCollection is certainly more flexible, but maybe MultiPolygon shows better query performance (given that I always create a 2dspehere index over the location field). Is it worth it to convert one representation into the other?

Christian
  • 3,239
  • 5
  • 38
  • 79

2 Answers2

3

Good news: query performance and indexability are the same in MongoDB for all supported GeoJSON types.

The main driver in your decision should be whether your info architecture for the geo field and the software that consumes it needs to contain more types than just polygons. You say you're storing point locations? If you want to hold all geo data in a single field e.g. location (and likely with a 2dsphere index on that) then you will need GeometryCollection into which you can put Pointand the MultiPolygon. It is recommended in the GeoJSON spec https://www.rfc-editor.org/rfc/rfc7946#page-9 not to nest GeometryCollection so for those data sources giving you a GeometryCollection, you would iterate the contents and populate your own GeometryCollection which also holds your Points etc.

If you are storing points separately, e.g. eventCenter as separate from eventAreasEffected, then the eventCenter can be just a Point and the eventAreasEffected can be a single 'MultiPolygon'; no need for GeometryCollection. It is perfectly fine to have geo in more than one field, and to have or not have multiple 2dsphere indexes on these fields. Starting in MongoDB 4.0, you can use $geoNear on a collection that has more than one 2dsphere index by including the key option.

Here's an unofficial but reasonable definitional approach: A MultiPolygon is not an arbitrary collection of Polygon but rather a single "shape concept" that happens to have disjoint polygons. The United States can be described in a single MultiPolygon that has Alaska, Hawaii, the continental US, maybe Puerto Rico, etc. In fact, to this end, you'll note that it is a little trickier to store data relevant to each member of the MultiPolygon because coordinates can only be an array of arrays of points. Information about the third polygon, for example, has to be carried in a peer field to the single toplevel coordinates field. But a discrete array of Polygon or a GeometryCollection of Polygon can store extra information in each shape. Note that neither GeoJSON nor MongoDB restrict you from adding fields in addition to type and coordinates for each shape.

A more subtle issue is the design and semantics of a GeometryCollection of Polygon vs. MultiPolygon. To further complicate it, there is the issue of explicit holes defined in the Polygon vs. a collection of implicitly "layered" Polygon that are post-processed outside of the DB by geo software.

Community
  • 1
  • 1
Buzz Moschetti
  • 7,057
  • 3
  • 23
  • 33
  • That's a very comprehensive and useful answer! Thanks for that! Regarding the design of the database, I'm currently using 2 collections: (a) one for points where every location field is `Point` and (b) one for areas where the location field is `MultiPolygon` (if there's only one polygon the length of the array is just 1). Right now I don't assume I will have individual information for each shape/polygon in a `MultiPolygon` but your point is valid. I'm not married to the current design -- particularly using 2 instead 1 collection -- but it's good to know that performance is less of a factor. – Christian Oct 08 '18 at 04:16
0

the problem with this subject is that there isnt a good answer. its all about what you will prefer or need. here is a great answer written on stackExchange.

Polygon vs MultiPolygon https://gis.stackexchange.com/questions/225368/understanding-difference-between-polygon-and-multipolygon-for-shapefiles-in-qgis

and i dont know about GeometryCollection so cant tell you anything about that. but this link will reveal alot of information to you.

Lars Hendriks
  • 998
  • 5
  • 22
  • The differences between the different types in a conceptual level seems rather intuitive. My questions refers a bit more for a implementation/performance point of view. `GeometryCollection` simply allows to put polygons, lines and points into the same location object. This allows, of course that `GeometryCollection` may contain only polygons, which, as a far as I can tell makes it conceptually the same as a `MultiPolygon` object. I only wonder if there a performance differences when it comes to querying large collections using those location types. – Christian Sep 26 '18 at 02:36
  • i would convert everything to a multipolygon object. as it seems that a geometryCollection contains a multipolygon object. so by that logic a multipolygon would by option to choose. – Lars Hendriks Sep 26 '18 at 06:34
  • > For type "MultiPolygon", each element in the coordinates array is a coordinates array as described for type "Polygon". A GeoJSON geometry object with type "GeometryCollection" is a geometry object which represents a collection of geometry objects. A geometry collection must have a member with the name "geometries". The value corresponding to "geometries" is an array. Each element in this array is a GeoJSON geometry object. – Lars Hendriks Sep 26 '18 at 06:34