0

In this geopandas example, the Antarctica land mass is dropped from the GeoDataFrame before reprojecting to the Mercator projection, in order to prevent problem with shapes containing a pole (which would become infinitely large).

I was wondering, if it is possible to find a more robust reprojection method, so that the dataframe does not need to be manually adjusted. Especially since I'm working with a dataset that does not have a separate row for Antarctica:

enter image description here

I have two ideas:

1. Use information about the destination crs

On the epsg website for example, the 'Area of use' is shown:

enter image description here

We could use it to prepare the data before the reprojection: drop any shapes that extend further South than -80 deg, or alternatively intersect it with a shapely Polygon that describes the area of use of the destination crs, in terms of the source crs - in this case the standard epsg:4326 so Polygon([(-180,-80), (-180,84), ...]).

Issue with this approach: I'm not sure, if this area of use information is programmatically accessible from somewhere for any crs, e.g. from the GeoDataFrame object.

2. Fix in post

Just do it, and pick out the erroneously reprojcted parts later. In my current case, for example, the reprojected geodataframe gdf_merc = gdf.to_crs(epsg=3395) does have errors...

enter image description here

...but by searching for the inf word in the string representation of the geometry, I can find the offending Polygon within the MultiPolygon...

In [360]: for i, polygon in enumerate(gdf_merc.geometry[0]):
   ...:     if 'inf' in str(polygon): 
   ...:         print(i)
  
0

...and just remove it:

enter image description here

Issue with this approach: seems elaborate and I'd prefer to prevent any problems from appearing in the first place.

Any thoughts on how to fix either of these methods, or is there a third way?

One remark: I'm interested in the general case, where any crs could be reprojected to, so I don't want to preemptively remove Antarctica ("just in case"), as other projections might be perfectly fine with it, and, more importantly, they might have other problem areas.

Many thanks!

Community
  • 1
  • 1
ElRudi
  • 2,122
  • 2
  • 18
  • 33

1 Answers1

2

The option 1 is likely the best shot in here. Latest GeoPandas uses pyproj.CRS to store CRS data, from which you can easily extract bounds of the projection.

To extract it from df:

import geopandas as gpd
df = gpd.read_file(gpd.datasets.get_path('nybb'))
df.crs.area_of_use.bounds

To get it from target CRS using pyproj directly:

import pyproj
crs = pyproj.CRS.from_epsg(3395)
crs.area_of_use.bounds

Then you can use built-in geopandas.clip to clip your data.

from shapely.geometry import box

df = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
crs = pyproj.CRS.from_epsg(3395)
bounds = crs.area_of_use.bounds
clipped = gpd.clip(df, box(*bounds))
clipped.to_crs(crs).plot()

result

martinfleis
  • 7,124
  • 2
  • 22
  • 30
  • Ah that's great! I had noticed that the `GeoDataFrame.crs` property only stores the information as a string - but that's because I specified is as such when instantiating the dataframe. If I pass the `crs` as a `pyproj.CRS` object, all this information is available from the dataframe property as well. I'll look into it, many thanks! – ElRudi Apr 08 '20 at 14:11
  • 1
    `GeoDataFrame.crs` does not store it as string anymore, that was deprecated. I'd recommend updating to 0.7.0 if you haven't done so. – martinfleis Apr 08 '20 at 15:15
  • It seems the `gpd.clip` method has some problems (I opened an issue [here](https://github.com/geopandas/geopandas/issues/1363)) and as a workaround I'd like to use something else. I was thinking about `gpd.overlay`, but that only works with `Polygon`s, or `maskdf.contains`, but that only works with `Point`s. If you happen to have a suggestion, please do share it :) – ElRudi Apr 10 '20 at 03:39
  • `gpd.overlay` works with all geometry types since 0.7.0. – martinfleis Apr 10 '20 at 08:45
  • Ah in that case, the docs are still off... I suppose I could create a branch and pull request for it, or is there a faster way for such issues? – ElRudi Apr 10 '20 at 17:55
  • Branch and PR is the fastest way. I see that docstring is not updated. – martinfleis Apr 10 '20 at 19:42