Sorting geospatial data

Question

It seems that there is no default way to handle a sort on a GEOSPATIAL data type. By this I mean the Type itself, and not a function, for example, getting the Area or Latitude of a type. For example, in something like BigQuery, it is not even supported at all. What are some possible (and useful ways) that GEO data may be sorted?

Off the top of my head, the only items that I think make sense across all geo types would be:

String-ify the field (perhaps for debugging, doesn't seem to have any useful case other than this).
Sorting by the Long/Latitude, polygons converted to centroid. For example, maybe the sort is the distance from Equator/Meridian.

What are the most common ways GEO data is usually sorted, or is sorting very uncommon with geographic data?

This question might be a better fit on https://gis.stackexchange.com/ — MrUpsidown, Jan 22 '21 at 14:55
Currently how are you storing the data? And what will be the order of sorting? — Kazi Mohammad Ali Nur Romel, Feb 04 '21 at 09:56

Craig Gers · Answer 1 · 2021-02-08T09:15:29.443

Spatial data is very much a part of most modern RDBMS, and by that, I refer to relational database management systems (as well as a few non-relational ones as well).

I presume that you are familiar with GIS and the concepts of spatial data. Esri is one of the world's leaders in spatial data applications and they very much rely on the SQL and Oracle spatial data types and underlying spatial engines.

As most GIS people understand, geographic feature stored in a database is usually represented by single data type in a database. These spatial data types include lines, points, polygons, circles, open paths etc. The point I am trying to make here, and possibly not very clearly is that each feature type, e.g. roads are stored/managed in a single table in the database - and have a specific geometry type.

So by way of example, you could have a roads table representing roads that are of a data type line, dams, or property boundaries represented by polygon data type etc.

If you work with GIS data, you will appreciate that at the most simple level, spatial data can be geographic, ie has coordinates represented by lat/long coordinates typically in the range x(-180,+180), y(-90,+90). Alternatively the data can be projected using some coordinate system, such as UTM, Albers etc. In such cases the map units may be represented by some other unit such as feet or metres.

In SQL server specifically, and I believe this is the same for Oracle, there were two supported geometry data types.

The geography type represents data in a round-earth coordinate system e.g. degrees minutes and seconds
The geometry type represents data in a Euclidean (flat) coordinate system and is used to represent projected data i.e. data, not in decimal degrees.

Both SQL Server, Oracle and PostgreSQL offer a very rich set of functionality for spatial data types that allow you to write SQL Queries that can manipulate, sort, select, clean and many more very clever things, like buffers, intersections etc.

It is just a case of learning the syntax for a new data type in the database. If you have used products like ArcGIS, assuming that your data is correctly stored in an RDBMS, you can perform most of your spatial data queries (particularly for vector data types) using SQL like scripts.

One more thing, while the database uses a geometry column to store the spatial geometries for your feature class in question, e.g. a line data type for roads. It is up to you to add any additional data columns or attributes that you need. Most RDBMS users understand this well. The advantage of this is that you are able to combine spatial queries that include your (non-spatial) attribute data. Furthermore, your spatial data can participate in relationships with other tables, e.g. one too many. So by way of an actual example, if you have historical yield data linked to a field number/identifier, it is possible to write queries that show those fields that say have the highest average yield for the past three years. The choice of attributes is determined by the requirement of the data. The true power of enterprise GIS systems is the ability to combine some spatial data with related information and crunch the information to make better decisions and understand relationships better. (However, as grand as this sounds, you can expect about 80-90% of this work involves data quality and assurance related activities. The fun part is the shortest and easiest part.) Garbage in ...

Here are the links for spatial data manipulation in different RDBMS's:

Oracle - https://docs.oracle.com/database/121/SPATL/spatial-concepts.htm#SPATL010
Sql Server - https://learn.microsoft.com/en-us/sql/t-sql/spatial-geometry/ogc-methods-on-geometry-instances?view=sql-server-ver15
PosgGresql - https://postgis.net/

True GIS functionality has been available for many years now, however not many have typically used the database alone, relying on third-party software such as Esri's ArcMap etc. These tools do have a role to play, but if you have good data, there is a lot of true GIS stuff that you can do via a typical SQL script.

Best of luck.

A practical example in MS SQL Server:

Create a spatial table and add test data. In this step, third party tools greatly assist this loading process, as in reality, the geometries are not something we type, country boundaries, for example, have thousands of vertices. (I have also intentionally omitted details relating to spatial reference ID (SRID) to keep it simple.)

IF OBJECT_ID ( 'dbo.SpatialTable', 'U' ) IS NOT NULL
DROP TABLE dbo.SpatialTable;
GO

CREATE TABLE SpatialTable
( id int IDENTITY (1,1),
[Geometry] geometry,
[StaText] AS [Geometry].STAsText());
GO

INSERT INTO SpatialTable ([Geometry])
VALUES (geometry::STGeomFromText('POLYGON ((0 0, 0 40, 40 80, 60 20, 0 0))', 0));

INSERT INTO SpatialTable ([Geometry])
VALUES (geometry::STGeomFromText('POLYGON ((0 0, 0 -40, 50 -10, 20 -10, 0 0))', 0));
GO
Select the inserted data

2B) A pictoral representation of the data

Sorting by a geometry property, area.

score 1 · Answer 2 · answered Jan 22 '21 at 07:59

I think there is no universal sorting of 2D data that would make sense in various contexts.

E.g. consider sorting by (longitude, latitude) pair. Unless you have points with exactly identical longitude (unlikely, given these are floating point numbers) this is the same as sorting by longitude alone! It is simple to explain, but does it make sense? Maybe for some applications, but it would place points at the equator and at the poles very close if they have close values of longitude, which is not good for other applications.

Similarly sorting by distance from equator could be useful in some cases, but not often.

For various applications, you need different sorting. E.g. BigQuery supports clustering by geography values. Clustering is not sorting, but somewhat similar. For that it uses Hilbert Curve, which takes both (longitude, latitude) into account and provides better way to cluster data than by longitude alone or by (longitude, latitude) pair which, as we saw, is almost the same. Hilbert Curve is useful, but it is not very intuitive, and would not make sense for other applications that ask for sorting.

thanks for that. In GIS have you ever come across sorting by a case when you sort by the string-value? For example, for debugging purposes (for example, if you wanted to view 10000 random points and put it in Excel...) ? — samuelbrody1249, Jan 22 '21 at 19:32

score 1 · Answer 3 · answered Feb 03 '21 at 19:03

1

Sorting is 100% common for spatial data in databases. In fact, the primary reason spatial data types were introduced into databases in the first place was to be able to sort by location and create spatial indexes. (If it was not for that, we would still be storing locations in two columns of Lat, Lon.)

There are many approaches to sorting spatial data. All of them are quite complex. Most commercial databases use variations of grid/tessellation schemes. Here is a review of spatial index mechanics in SQL Server.

answered Feb 03 '21 at 19:03

Michael Diomin

530
3
13

I see, thanks for the response. Do people ever sort by the string of it (even for debugging?) or is that not common at atll. – samuelbrody1249 Feb 03 '21 at 19:29
String (or binary) comparison is good for match/not-match. It does not help with greater/lesser. – Michael Diomin Feb 03 '21 at 19:58

Andrew T · Answer 4 · 2021-02-10T19:29:47.597

If you think of most geographic data as existing on a 2-dimensional plane then a uniform grid overlaid on top can provide a indexable and hierarchical linear ordering system. This is in essence a spatial index, which is described well in the following article

SQL Server starts with a B+ tree structure, which organizes data into a linear fashion. Because of this, the indexes must have a way to represent the two-dimensional spatial information as linear data. For this, SQL Server uses a process referred to as the hierarchical uniform decomposition of space. When the index is created, the database engine decomposes, or refactors, the space into a collection of axes aligned along a four-level grid hierarchy. Figure 1 provides an overview of what this process looks like.

Spatial indexes are basically a grid, and indexing is possible by decomposing from higher-level space to lower, and linear order. SQL Server uses the hierarchical uniform decomposition of space method before reading the data is read in the spatial index. This not only leads to faster performance, but also the possibility of implementing sorting schemes as well.

If you want a simple implementation of this for your own work, then performing a spatial join of your data to any uniform grid with an ID value that is sequentially ordered will work. There are many algorithms for creating a uniform index grid, and Esri provides a tool for generating tessellation schemes. The tessellation can be of triangles, squares, diamonds, hexagons, or transverse hexagons.

I added some info to my answer at the bottom regarding how to overlay a grid for simple sorting purposes. To answer your other comments, if you wanted to do this yourself tabularly then you would want to first spatially join your points to a uniform grid with a linear index and then sort by that column. That would be the best way to sort spatial data. You can generate a uniform grid using Esri tools: https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/generatetesellation.htm — Andrew T, Feb 10 '21 at 19:23

Sorting geospatial data

4 Answers4