0

I have a raster grid at 0.5 degree resolution (r) and a dataframe (my_df) with 3 columns: long, lat and id. The dataframe represents species occurrence records.

What I want to do is determine which species are present in each 0.5 degree cell of my raster grid, and for each cell only keep 1 record of each species (my_df has more than 90,000,000 rows), so if a 0.5 degree cell only has one species, there will be a row with the lat, long of the raster grid cell and then the species ID from the dataframe. Other raster grid cells may contain hundreds of species, so may have hundreds of rows.

Ultimately I would like to create a dataframe that has long and lat of the 0.5 degree raster grid that each species location falls into and the species ID that are present there, one row for each species.

I have created a raster grid, as per...

ext <- extent(-180.0, 180, -90.0, 90.0)
gridsize <- 0.5
r <- raster(ext, res=gridsize)
crs(r) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

and a dataframe, which was originally a SpatialPolygonsDataframe...

A tibble: 6 x 3
   long   lat id   
  <dbl> <dbl> <chr>
1  16.5 -28.6 0    
2  16.5 -28.6 0    
3  16.5 -28.6 0    
4  16.5 -28.6 0    
5  16.5 -28.6 0    
6  16.5 -28.6 0 
etc
etc

...but am unsure of how to proceed with the rest of the method. I have tried rasterizing my data, extracting points etc but I am continually hitting errors and am unsure of the correct method to use to achieve my aim.

Alternatively, if anyone knows how to extract species names directly form the SpatialPolygonsDataFrame which contains a range polygon for each species, at 0.5 degree raster grid cell locations, that would be excellent.

Any help would be greatly appreciated.

TJeff
  • 29
  • 8
  • You can visit this [link](https://stackoverflow.com/a/60317798/6123824), [this](https://gis.stackexchange.com/questions/60527/how-to-extract-values-from-rasters-at-location-of-points-in-r) and [this](https://stackoverflow.com/a/58859724/6123824). – UseR10085 Apr 07 '20 at 05:14

2 Answers2

1

If I guessed correctly, you want to match points that fall within cells. I think you are looking for a spatial join based on interesection between points and polygons.

I highly recommend you to use sf package rather than sp objects. That's what I'm going to propose you.

First, create the grid with st_make_grid function

library(sf)
library(dplyr)

ext <- raster::extent(-180.0, 180, -90.0, 90.0)

grid <- st_bbox(ext) %>% 
  st_make_grid(cellsize = 0.5, what = "polygons") %>%
  st_set_crs(4326)
grid <- grid %>% st_sf() %>% mutate(id_cell = seq_len(nrow(.)))

Then let's take a simple dataframe:

df <- data.frame(long = 16.51, lat = -28.6, id = 0)
df <- df %>% sf::st_as_sf(coords = c("long","lat"), crs = 4326)

df

Simple feature collection with 1 feature and 1 field
geometry type:  POINT
dimension:      XY
bbox:           xmin: 16.51 ymin: -28.6 xmax: 16.51 ymax: -28.6
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
  id            geometry
1  0 POINT (16.51 -28.6)

Then, you need to use st_join function. By default the spatial join is based on intersection:

df %>% sf::st_join(grid, left = TRUE)

although coordinates are longitude/latitude, st_intersects assumes that they are planar
Simple feature collection with 1 feature and 2 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 16.51 ymin: -28.6 xmax: 16.51 ymax: -28.6
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
  id id_cell            geometry
1  0   88234 POINT (16.51 -28.6)

I assumed you wanted a left join (report all your points). You can change that option. I think using sf will be faster than a hand-coded technique.

linog
  • 5,786
  • 3
  • 14
  • 28
  • Hi, thank you for your quick response, I tried the above, but it threw an error: Error in CPL_geos_binop(st_geometry(x), st_geometry(y), op, par, pattern, : std::bad_alloc I'm not sure what this means - also, how do I pull out the centre point of the grid (lat and long) from the list for each grid cell? What I want to do is get the species number for each species that is present in each 0.5 grid cell. I have done it with other data sources, but this source is a spatialpolygon hence all questions as I haven't dealt with one before... – TJeff Apr 21 '20 at 01:15
  • I mean species id, not species number – TJeff Apr 21 '20 at 04:17
  • I looks like it's a memory problem [see here](https://github.com/r-spatial/sf/issues/394). In your example dataframe, you have many duplicated rows. Is it the case in your real dataframe? Because, in that case, you could first aggregate `df` by counting distinct (lon, lat, id) and then do the merge. Reducing `df` size would help you reduce the memory needs. – linog Apr 21 '20 at 05:06
  • Maybe [spatial indexes](https://www.r-spatial.org/r/2017/06/22/spatial-index.html) can help – linog Apr 21 '20 at 05:35
  • Thank you, I'll look into reducing the size as you say. Have a good day! – TJeff Apr 21 '20 at 22:03
0

With point data you can do that like this

Example data

#species
set.seed(0)
n <- 20
spp <- data.frame(lon=runif(n, -180, 180), lat=runif(n,-90,90), sp=sample(5, n, replace=TRUE)) 

# raster
library(raster)
# for the example I use a resolution of 90, rather than 0.5 
r <- raster(res=90)

Now compute the cell number for each location and tabulate. The way I do it it returns the counts, rather than only presence/absence

spp$cell <- cellFromXY(r, spp[, c("lon", "lat")])
tb <- table(spp$cell, spp$sp)

To get the lon/lat for each cell

xy <- xyFromCell(r, as.integer(rownames(tb)))
result <- cbind(xy, tb)
colnames(result)[1:2] <- c("lon", "lat")
result
#   lon lat 1 2 3 4 5
#1 -135  45 0 0 1 0 0
#2  -45  45 0 2 1 0 0
#3   45  45 1 0 0 2 0
#4  135  45 0 1 0 0 1
#5 -135 -45 1 2 0 0 0
#6  -45 -45 0 1 0 1 0
#7   45 -45 1 1 0 0 0
#8  135 -45 1 0 1 2 0

With polygon data (and with point data as well) you could use raster::rasterize

Example polygon data

library(raster)
p1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20))
hole <- rbind(c(-150,-20), c(-100,-10), c(-110,20), c(-150,-20))
p1 <- list(p1, hole)
p2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55), c(-10,0))
p3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45), c(-125,0))
spp <- data.frame(species=letters[1:3], stringsAsFactors=FALSE)
pols <- spPolygons(p1, p2, p3, attr=spp)

Rasterize each species and combine in a RasterStack. If you have many species you want to assign a filename to the rasterize argument, such as filename = paste0("sp_", i, ".tif")

usp <- unique(spp$species)
r <- raster(res=0.5)
s <- list()
for (i in 1:length(usp)) {
    p <- pols[pols$species == usp[i], ]
    s[[i]] <- rasterize(p, r, field=1, fun="count")
}       
ss <- stack(s)

(for species richness do sr <- sum(ss>0, na.rm=TRUE))

Create the output you desire

m <- as.matrix(ss)
m[is.na(m)] <- 0
# to remove rows with no species 
i <- which(rowSums(m) > 0)
xy <- xyFromCell(r, i)  
output <- cbind(xy, m[i,])
colnames(output) <- c("lon", "lat", usp)
head(output)
#        lon   lat a b c
#[1,]  -0.25 59.75 0 0 1
#[2,] 139.75 59.75 0 1 0
#[3,]  -1.25 59.25 0 0 1
#[4,]  -0.75 59.25 0 0 1
#[5,]  -0.25 59.25 0 0 1
#[6,]   0.25 59.25 0 0 1
Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
  • Thank you for your response, I will try this too, this feels quite complex, but such is R to a relative novice like me! – TJeff Apr 21 '20 at 01:21
  • Hi I managed to make a raster stack that contains the range of each species (963 species in total, 1 raster for each species) what I need to do is combine this with a raster grid at 0.5 degree scale that I created, I"ve been trying to use the extract function but it doesn't seem to work. I struggled to follow your code above but it did help. Would you know how to combine them? – TJeff May 13 '20 at 02:41
  • please ask a new question to address that – Robert Hijmans May 13 '20 at 04:51