0

I have a regional netcdf file with chlorophyll data and want to add a new variable "region" that maps information about the maritime region. So for each coordinate point there would be an information to which region this point belongs.

So for all points in the netcdf file laying in the region of the Mediterranean sea, the variable "region" would for example contain the value 0 and for all points laying in the region of the North Sea, the variable "region" would for example contain the value 1 and so on...

I found a shapefile containing maritime regions based on biogeochemical processes (Longhurst Provinces). Now I want to use this shapefile to define my "region" variable in the netcdf file. Does anybody know how I could do that in R?

I think my question is close to this post by DKRZ but I don't want to extract/mask only one region but map all regions that are defined in the shapefile.

The files can be found here:

https://drive.google.com/file/d/1kgPpHFapmuclDyUvw2TH_10i018GE9YH/view?usp=sharing

https://drive.google.com/file/d/1WLYEUs6XllZv6i0xjRif-N0syhR2lX01/view?usp=sharing

Thanks a lot already!

EDIT
I found this post that helped me to solve my problem: https://www.r-bloggers.com/2014/05/converting-shapefiles-to-rasters-in-r/)

Mim_Tauch
  • 1
  • 1
  • Hi, welcome. Just a comment: I have experience using both R and Python to deal with NetCDF. What you want to do is easy to achieve using ````xarray```` package in Python. But I do not know the R solution, I personally think Python is more suitable for NetCDF tasks. – Jeremy May 13 '21 at 14:15

1 Answers1

0

This is what I could put together based on Map Lat/Lon Points to a Shape File in R. The main work is done by over(), which takes a few minutes.

library(ncdf4)
nc_data = nc_open('/mnt/d/Downloads/CCI_ALL-v5.0-MONTHLY.nc', T)    # write
lon = ncvar_get(nc_data, "lon")
lat = ncvar_get(nc_data, "lat")
library(rgdal)
shapefile = readOGR('/mnt/d/Downloads/Longhurst_world_v4_2010.shp')
grid = expand.grid(lon=lon, lat=lat)
coordinates(grid) = ~lon+lat
grid@proj4string = shapefile@proj4string
# for each grid point, find the province it belongs to (if available):
prov = over(grid, shapefile['ProvCode'])
# define dimensions for the new variable
londim = ncdim_def("lon", ncatt_get(nc_data, "lon")$units, lon)
latdim = ncdim_def("lat", ncatt_get(nc_data, "lat")$units, lat)
# define the new variable
region = ncvar_def("region", "", list(londim, latdim), prec="integer")
# add the new variable to the netCDF file
nc_data = ncvar_add(nc_data, region)
# write the variable's values to the file
ncvar_put(nc_data, region, prov$ProvCode)
nc_close(nc_data)

rgdal version etc.:

Loading required package: sp
rgdal: version: 1.4-8, (SVN revision 845)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
 Path to GDAL shared files:
 GDAL binary built with GEOS: TRUE
 Loaded PROJ.4 runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 630]
 Path to PROJ.4 shared files: (autodetected)
 Linking to sp version: 1.3-2
OGR data source with driver: ESRI Shapefile
Source: "/mnt/d/Downloads/Longhurst_world_v4_2010.shp", layer: "Longhurst_world_v4_2010"
with 54 features
It has 2 fields
Armali
  • 18,255
  • 14
  • 57
  • 171
  • Thanks a lot for your answer. Unfortunately when I tried it out I got an error message with the over() function: `Error in .local(x, y, returnList, fn, ...) : identicalCRS(x, y) is not TRUE` Do you have a solution for that? – Mim_Tauch May 17 '21 at 07:24
  • Strange... the line `grid@proj4string = shapefile@proj4string` should make identical CRS, and it works here with your given files. What is your `grid@proj4string` and `shapefile@proj4string`? – Armali May 17 '21 at 07:52
  • I added script output for version comparison. – Armali May 17 '21 at 07:59
  • 1
    Ok I rerun it after restarting R and the error did not occur anymore. Sorry for not doing that before asking again. But now I got a new error message after the code was running for a while: `Error: vector memory exhausted (limit reached?)`. I use a Mac with 16 gb ram and 2,8 ghz i7. Maybe I can try to run the function on multiple cores..? – Mim_Tauch May 17 '21 at 14:20
  • I think using multiple cores will rather increase memory consumption. I wonder why the memory is exhausted, since here with Ubuntu on Windows 10 the virtual memory size of the R process after `over()` is just about 3.3 GB. When you restarted R, did you have an `.RData` file from which additional old data is reloaded which occupies memory? You might also try to execute the script with `Rscript`. – Armali May 17 '21 at 16:42
  • 1
    There was no .Rdata file and trying it with Rscript resulted in the same error message. I tried it several times, also after restarting the computer and closing all other programs. People with the same problem could find help here: https://stackoverflow.com/questions/51295402/r-on-macos-error-vector-memory-exhausted-limit-reached – Mim_Tauch May 19 '21 at 17:13