1

I'm trying to create a mask following this question's answer, so that I can color areas inside my shape file only. I will post the code example here (which is pretty much the same with the code in my previous question, but with the solution code included).

A quick summary of my code: It's a data of (lon, lat, tem). I transform the lon and lat into ccrs.PlateCaree() coordinate system (which is in meter and not degree anymore), and put the transformed coordinates in xp, yp. The 3 variables xp, yp, and tem are then interpolated and then I use pmeshcolor() to draw a color map (the same color map in my previous question that I left the link above). Now I want to limit the color map to be inside the shape file only (which is the grey lines). Code is as below:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import matplotlib.colors as clr
import pandas as pd
import numpy as np

from metpy.interpolate import interpolate_to_grid, remove_nan_observations
from cartopy.io.shapereader import Reader
from cartopy.feature import ShapelyFeature

canada_east = -95
canada_west = -101.8
canada_north = 52.8
canada_south = 48.85

central_lon = (canada_east + canada_west)/2
central_lat = (canada_north + canada_south)/2

crs = ccrs.LambertConformal(central_longitude = central_lon, central_latitude = central_lat)
lat = np.array([49.8134 50.904  50.698  49.095  49.436  49.9607 49.9601 49.356  50.116

49.402  52.3472 50.411  49.24   49.876  49.591  49.905  49.498  49.088
49.118  50.5947 49.3776 49.148  49.1631 51.358  49.826  50.4324 49.96
49.68   49.875  50.829  51.572])
lon = np.array([-100.3721  -97.273   -99.068   -97.528  -100.308   -98.9054  -98.6367
-99.248   -96.434  -100.93   -101.1099 -100.893  -100.055   -99.909
-97.518   -99.354   -98.03    -99.325   -99.054   -98.0035 -100.5387
-100.491   -97.1454 -100.361   -96.776   -99.4392  -97.7463  -97.984
-95.92    -98.111  -100.488])
tem = np.array([-8.45   -4.026  -5.993  -3.68   -7.35   -7.421  -6.477  -8.03   -3.834
-13.04   -4.057  -8.79   -6.619 -10.89   -4.465  -8.41   -4.861  -9.93
-7.125  -4.424 -11.95   -9.56   -3.86   -7.17   -4.193  -7.653  -4.883
-5.631  -3.004  -4.738  -8.81])

xp, yp, _ = crs.transform_points(ccrs.PlateCarree(), lon, lat ).T
xp, yp, tem = remove_nan_observations(xp, yp, tem)

alt_x, alt_y, data = interpolate_to_grid( xp, yp, tem, minimum_neighbors=2, search_radius=240000, interp_type = 'barnes', hres = 1000)

# Create the figure and grid for subplots
fig = plt.figure(figsize=(17, 12))

# Main ax
ax = plt.subplot(111, projection=crs)
ax.set_extent([canada_west, canada_east, canada_south, canada_north], ccrs.PlateCarree())


# Read shape file again using geopandas and create a mask

cat_gdf = geopandas.read_file('MB_AGregion_Perim_South.shp')
cat_gdf = cat_gdf.to_crs(epsg = 4326)
mask = shapely.vectorized.contains(cat_gdf.dissolve().geometry.item(), alt_x, alt_y)

 print(mask)

which gives

[[False False False ... False False False]
 [False False False ... False False False]
 [False False False ... False False False]
 ...
 [False False False ... False False False]
 [False False False ... False False False]
 [False False False ... False False False]]

Process finished with exit code 0

My alt_x and alt_y is in meter and my shape file geometries are in degree. I believe this is the reason why the mask contains all false value (which means masking everything). However, I'm unable to find a solution to this problem. I would appreciate a lot if someone can help me with this problem.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
QHoang
  • 45
  • 5
  • hi again! same comments as last time. please remove all of the irrelevant code from your question. do you need to plot lines on top of the image in order to solve this problem? I would think not. please read this guide carefully: [mre] and also see this for additional helpful context: [crafting minimal bug reports](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports). It really helps us a lot if you can limit the question to only containing just what is specific to the issue you're asking about. Thanks! – Michael Delgado Jun 09 '22 at 18:10
  • it looks like you're converting all of your points to LambertConformal, but converting the shapefile you're masking with to EPSG:4326, which is WGS84 (simple lat/lon). So that's probably the mismatch. Does the problem go away if you do `cat_gdf = cat_gdf.to_crs(epsg = crs)` instead? – Michael Delgado Jun 09 '22 at 18:17
  • thanks for the updates! try not to include "Edit:" comments in your question - just go ahead and make the changes which are necessary to clarify the question. Anyone can view the [revisions](https://stackoverflow.com/posts/72563553/revisions) so there's no mystery about what has changed :) – Michael Delgado Jun 09 '22 at 19:33
  • have you had a chance to see whether my suggestion works? note that [`PlateCarree`](https://scitools.org.uk/cartopy/docs/latest/_modules/cartopy/crs.html#PlateCarree) is *NOT* in meters - it's still in degrees. I think the mismatch between cat_gdf and your points are the source of your issue. – Michael Delgado Jun 09 '22 at 19:37
  • @MichaelDelgado sigh I feel like a man coming out of cave. Thank you for your patience you have shown me a lot. To your suggestion, yes I have tried to do cat_gdf = cat-gdf.to_crs( crs = proj ) where proj = ccrs.PlateCarree() (I don't even know that I can do such thing !!). Yes you are right Plate Carree is still in degrees (I checked by printing out cat_gdf.geometry). This confuse me greatly because when I do: xp, yp, _ = crs.transform_points(ccrs.PlateCarree(), lon, lat ).T, my xp and yp are in meters (they need to be in meters so that I can interpolate and plot them with scatter() ) – QHoang Jun 09 '22 at 19:55
  • no sorry - I literally meant `cat_gdf = cat_gdf.to_crs(epsg = crs)` using your LambertConformal `crs` object. You keep saying you want to convert everything to meters - but you need to do that with the shapefile too! Converting the shapefile to PlateCarree doesn't help you. – Michael Delgado Jun 09 '22 at 20:13

1 Answers1

2

Here's another MRE. Keeping things very simple - I'm just using the bbox and vector of (x, y) values rather than the mesh you have. But it should be enough to illustrate the issue you're dealing with.

import cartopy.crs as ccrs
import shapely.geometry
import shapely.vectorized
import numpy as np, pandas as pd, geopandas as gpd, matplotlib.pyplot as plt

canada_east = -95
canada_west = -101.8
canada_north = 52.8
canada_south = 48.85

central_lon = (canada_east + canada_west)/2
central_lat = (canada_north + canada_south)/2

crs = ccrs.LambertConformal(
    central_longitude=central_lon,
    central_latitude=central_lat,
)

x = np.arange(-120, -90)
y = np.ones_like(x) * 50

bbox = shapely.geometry.Polygon([
    (canada_west, canada_south),
    (canada_east, canada_south),
    (canada_east, canada_north),
    (canada_west, canada_north),
 ])

gdf = gpd.GeoDataFrame(geometry=[bbox], crs='epsg:4326')

Here's what x, y, and the geodataframe look like. Note that they're all in degrees, techincally WGS84, aka 'epsg:4326':

In [2]: print(f"x values: {x}\n\ny values: {y}\n\ngeodataframe:\n{gdf}")
x values: [-120 -119 -118 -117 -116 -115 -114 -113 -112 -111 -110 -109 -108 -107
 -106 -105 -104 -103 -102 -101 -100  -99  -98  -97  -96  -95  -94  -93
  -92  -91]

y values: [50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
 50 50 50 50 50 50]

geodataframe:
                                            geometry
0  POLYGON ((-101.80000 48.85000, -95.00000 48.85...

If we use shapely.vectorized.contains when both the points and the geodataframe are in degrees, we get the expected answer:

In [3]: shapely.vectorized.contains(gdf.dissolve().geometry.item(), x, y)
Out[3]:
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True, False, False,
       False, False, False])

The CRS PlateCarree() is a plotting CRS which preserves lat/lon values, so you can think of it as equivalent to WGS84 (only difference is PlateCarree has a defined mapping to pixel values). So the following translates our lat/lon values into the CRS of our object CRS, which is LambertConformal:

In [4]: xp, yp, _ = crs.transform_points(ccrs.PlateCarree(), x, y ).T

Now, our xp and yp values are in meters and look very different from the values in the geodataframe:

In [5]: print(f"xp values: {xp}\n\nyp values: {yp}\n\ngeodataframe:\n{gdf}")
xp values: [-1555398.30438114 -1484657.61843652 -1413737.15236547 -1342645.49406745
 -1271391.25217193 -1199983.05499594 -1128429.54949926 -1056739.40023735
  -984921.28831211  -912983.91032072  -840935.97730252  -768786.21368415
  -696543.35622316  -624216.15294999  -551813.36210868  -479343.75109632
  -406816.09540141  -334239.17754116  -261621.78599806  -188972.71415562
  -116300.7592336    -43614.72122269    29076.59818104   101764.396642
   174439.87225098   247094.22459092   319718.65580267   392304.37165025
   464842.58258582   537324.50481399]

yp values: [ 92537.27851361  75810.36405389  59862.8937992   44696.79886026
  30313.91572949  16715.98605863   3904.65644788  -8118.52175353
 -19352.09263517 -29794.69590175 -39445.06703778 -48302.03746074
 -56364.53466258 -63631.58233956 -70102.30051051 -75775.90562338
 -80651.71065011 -84729.12516982 -88007.65544033 -90486.90445793
 -92166.57200545 -93046.45468863 -93126.44596073 -92406.53613545
 -90886.8123881  -88567.45874503 -85448.75606135 -81531.08198695
 -76814.91092072 -71300.81395314]

geodataframe:
                                            geometry
0  POLYGON ((-101.80000 48.85000, -95.00000 48.85...

At this point, if we use contains to find points within the geodataframe, it's clear why we get only False values:

In [6]: shapely.vectorized.contains(gdf.dissolve().geometry.item(), xp, yp)
Out[6]:
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False])

To convert the geodataframe to our target crs, we can use to_crs:

In [7]: gdf_lambert = gdf.to_crs(crs)

Now, our values line up, and contains works as expected once again:

In [8]: print(f"xp values: {xp}\n\nyp values: {yp}\n\ngdf_lambert:\n{gdf_lambert}")
xp values: [-1555398.30438114 -1484657.61843652 -1413737.15236547 -1342645.49406745
 -1271391.25217193 -1199983.05499594 -1128429.54949926 -1056739.40023735
  -984921.28831211  -912983.91032072  -840935.97730252  -768786.21368415
  -696543.35622316  -624216.15294999  -551813.36210868  -479343.75109632
  -406816.09540141  -334239.17754116  -261621.78599806  -188972.71415562
  -116300.7592336    -43614.72122269    29076.59818104   101764.396642
   174439.87225098   247094.22459092   319718.65580267   392304.37165025
   464842.58258582   537324.50481399]

yp values: [ 92537.27851361  75810.36405389  59862.8937992   44696.79886026
  30313.91572949  16715.98605863   3904.65644788  -8118.52175353
 -19352.09263517 -29794.69590175 -39445.06703778 -48302.03746074
 -56364.53466258 -63631.58233956 -70102.30051051 -75775.90562338
 -80651.71065011 -84729.12516982 -88007.65544033 -90486.90445793
 -92166.57200545 -93046.45468863 -93126.44596073 -92406.53613545
 -90886.8123881  -88567.45874503 -85448.75606135 -81531.08198695
 -76814.91092072 -71300.81395314]

gdf_lambert:
                                            geometry
0  POLYGON ((-251935.076 -217891.780, 251935.076 ...

In [9]: shapely.vectorized.contains(gdf_lambert.dissolve().geometry.item(), xp, yp)
Out[9]:
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True,  True, False,
       False, False, False])

Graphical representation

We can also see this very clearly when plotting. Here's the points and shapefile in lat/lon space, using the contains mask to color the points blue if outside and green if inside:


In [12]: fig, ax = plt.subplots()
    ...: gdf.plot(ax=ax, alpha=0.5)
    ...: mask = shapely.vectorized.contains(gdf.dissolve().geometry.item(), x, y)
    ...: ax.scatter(x[mask], y[mask], color='green')
    ...: ax.scatter(x[~mask], y[~mask], color='blue')
Out[12]: <matplotlib.collections.PathCollection at 0x18f5a3ac0>

straight line of dots, with some correctly indicated as intersecting the shapefile

If we mix and match projections, none of the points are in the shape (which is a tiny invisible spec near (0, 0) when the axes are scaled by meters):

In [14]: fig, ax = plt.subplots()
    ...: gdf.plot(ax=ax, alpha=0.5)
    ...: mask = shapely.vectorized.contains(gdf.dissolve().geometry.item(), xp, yp)
    ...: ax.scatter(xp[mask], yp[mask], color='green')
    ...: ax.scatter(xp[~mask], yp[~mask], color='blue')
Out[14]: <matplotlib.collections.PathCollection at 0x1919e58d0>

curved arc of blue dots, not intersecting a shapefile

When the shapefile and the points are both in LambertConformal, we're back in business:

In [16]: fig, ax = plt.subplots()
    ...: gdf_lambert.plot(ax=ax, alpha=0.5)
    ...: mask = shapely.vectorized.contains(gdf_lambert.dissolve().geometry.item(), xp, yp)
    ...: ax.scatter(xp[mask], yp[mask], color='green')
    ...: ax.scatter(xp[~mask], yp[~mask], color='blue')
Out[16]: <matplotlib.collections.PathCollection at 0x191a88ee0>

curved dots with some correctly shown as intersecting the shapefile

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • 1
    I’d like to express my appreciation for your help. Your example here is clear and the solution is so much more simple than other posts. You have saved me from this endless frustration !! – QHoang Jun 10 '22 at 00:32
  • glad it helped! thanks to you too for working through all the edits :) these were fun questions. welcome to the site! – Michael Delgado Jun 10 '22 at 01:02