4

I have a large dataset of global .nc files and I am trying to clip them to a smaller area. I have this area stored as a .shp file.

I have tried using gdal from Qgis but needs to do this by converting each variable and I must select each variable and same shape for all files one by one and with 400 files going trough each variable seems not the best idea. Also this returns .tiff files separated and not the .nc file that i am aiming for.

I had this little script but its not doing what i need

    import glob
    import subprocess
    import os
    
    ImageList = sorted(glob.glob('*.nc'))
    print('number of images to process: ', len(ImageList))
    
    Shapefile = 'NHAF-250m.shp'
    
    # Create output directory
    OutDir = './Clipped_Rasters/'
    if not os.path.exists(OutDir):
        os.makedirs(OutDir)
    
    for Image in ImageList:
        print('Processing ' + Image)
    
        OutImage = OutDir + Image.replace('.nc', '_BurnedArea_Clipped.tif') # Defines Output Image
    
        # Clip image
        subprocess.call('gdalwarp -q -cutline /Users/path/to/file/NHAF-250-vector/ -tr 0.25 0.25 -of GTiff NETCDF:'+Image+":burned_area "+OutImage, shell=True)
    
    
        print('Done.' + '\n')
    
    print('All images processed.')

Thank you in advance

2 Answers2

2

I recommend to use xarray to handle netcdf data and geopandas + rasterio to handle your Shapefile.

import geopandas 
import xarray
import rasterio
import glob

shapefile = 'NHAF-250m.shp'

sf = geopandas.read_file(shapefile)
shape_mask = rasterio.features.geometry_mask(sf.iloc[0],
                                      out_shape=(len(ndvi.y), len(ndvi.x)),
                                      transform=ndvi.geobox.transform,
                                      invert=True)
shape_mask = xarray.DataArray(shape_masj , dims=("y", "x"))

file_list = sorted(glob.glob('*.nc'))

for file in file_list:
    nc_file = xarray.open_dataset(file)
    # Then apply the mask
    masked_netcdf_file = nc_file.where(shape_mask == True, drop=True)
    # store again as netcdf or do what every you want with the masked array
    
dl.meteo
  • 1,658
  • 15
  • 25
  • 2
    I get your point but I guess ndvi attributes come from ather file. When using my dataset I get an error on geobox.transform : 'Dataset' object has no attribute 'geobox' – Sabina Hurtado Nov 02 '20 at 15:59
0

You could also try to combine @dl.meteo's answer with shapely and rioxarray packages (the latter just needs to be installed in your env).

import xarray
import geopandas
from shapely.geometry import mapping

shapefile = 'NHAF-250m.shp'
sf = geopandas.read_file(shapefile)

file_list = sorted(glob.glob('*.nc'))

for file in file_list:
    nc_file = xarray.open_dataset(file)
    clipped_nc = nc_file.rio.clip(sf.geometry.apply(mapping), sf.crs, all_touched = True)

For it to work, your netcdf files and shapefile need to have a Coordinates Reference System included. If not, you can add it manually:

# For netcdf files:
nc.rio.write_crs('epsg:xxxx', inplace = True)
# For shapefile:
sf.set_crs('epsg:xxxx', inplace = True, allow_override = True)
Alex
  • 26
  • 5