0

I have some long-standing package code that uses raster::rasterize that I'm trying to update to terra::rasterize. The code takes point data, where each point has one of two possible integer ID values. The output is a raster with two layers, one for each possible point ID, where cell values are counts. The relevant bits are:

  # r0 is template raster to define extent and resolution
  r <- raster::rasterize(dat[, c("X", "Y")],
                         r0,
                         field = dat$flightlineID,
                         fun = f,
                         background = 0)

Here, f is a function that takes a vector of point IDs and returns a two-element vector of counts, which results in the desired two layer output raster.

My first attempt to port this to terra::rasterize (package version 1.6-17) was...

r <- terra::rasterize(cbind(dat$X, dat$Y), # seem to need a matrix rather than a data frame
                      r0,  # template SpatRaster
                      values = dat$flightlineID, 
                      fun = f, 
                      background = 0)

This fails with the error:

Error in w[vv[, 1], ] <- vv[, -1] : number of items to replace is not a multiple of replacement length

Delving into the code for terra:::rasterize_points it seems that the number of layers for the output raster is determined by treating the 'values' argument as a data frame and checking the number of columns. This is a bit confusing because the package docs state that the values argument is expected to be a numeric vector, of either length 1 or nrow(x) where x is the input point data. Moreover, the length of the vector returned by the user-supplied summary function doesn't seem to play any part in determining the number of output raster layers.

For the moment I've simply retained the old raster::rasterize code and convert the output raster to a SpatRaster, but I think I must be missing something obvious. Is there a way of using just terra::rasterize to accomplish this task?

EDIT: As requested in comments, here is a small sample of the input point data to show the format. Typical input data sizes range from 2 to 40 million points.

structure(list(X = c(420094, 420067, 420017, 420050, 420058, 
420090, 420038, 420040, 420081, 420097, 420075, 420041, 420039, 
420062, 420050, 420083, 420019, 420019, 420044, 420087, 420099, 
420077, 420030, 420014, 420015, 420051, 420033, 420056, 420041, 
420030, 420027, 420024, 420058, 420042, 420063, 420028, 420073, 
420053, 420010, 420100, 420048, 420062, 420056, 420080, 420053, 
420068, 420074, 420004, 420010, 420078), Y = c(6676049, 6676029, 
6676034, 6676019, 6676096, 6676010, 6676003, 6676048, 6676073, 
6676023, 6676089, 6676082, 6676010, 6676051, 6676039, 6676099, 
6676024, 6676073, 6676040, 6676056, 6676072, 6676086, 6676030, 
6676042, 6676002, 6676033, 6676078, 6676073, 6676013, 6676056, 
6676055, 6676069, 6676072, 6676089, 6676069, 6676058, 6676023, 
6676039, 6676043, 6676017, 6676011, 6676054, 6676095, 6676068, 
6676098, 6676077, 6676049, 6676073, 6676097, 6676057), flightlineID = c(2L, 
1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 
2L)), row.names = c(NA, -50L), class = "data.frame")

EDIT: In the raster package code, the private .pointsToRaster function has a line (see here) where the length of the output from the user-supplied summary function is checked with some arbitrary test values to determine the number of layers in the output raster. This seems to be absent from the terra package code.

michael
  • 762
  • 7
  • 13
  • Can you please provide `dat` in `dput` format, at least `head(dat, 50)`? – UseR10085 Oct 27 '22 at 03:55
  • @UseR10085 - just updated with some example data – michael Oct 27 '22 at 04:33
  • 2
    First, try to update to latest devt version of `terra` as dev't is very active. I'm often wrong about these things, but in this case, perhaps more data than less as a google/drop box. Is this summarizing LAS data? – Chris Oct 27 '22 at 13:10
  • @Chris, Yes - it's LAS data and the code is part of a package we use in my workplace (https://github.com/mbedward/CERMBlidar). This bit of code is part of a step that tries to correct for artefacts caused by upstream processing of the data, where the overlap between flightlines was removed for some but not all point classes (sigh). There's nothing special about the data values - you could simulate test data with runif for the X and Y coordinates within a 2 x 2km rectangle and sample 1:2 for the integer IDs. I'll try updating terra today. – michael Oct 27 '22 at 22:39
  • @Chris - thanks for suggesting the dev version of terra. I just tried v1.6-33 from the rspatial.r-universe.dev repo but it produces the same error as the CRAN version. The code for the private rasterize_points function appears unchanged. – michael Oct 28 '22 at 01:50
  • So, the problem will remain, overlap reconciliation (in upstream processing) irrespective of how terra differs from raster in returning differentiated layers, and were terra to do the same as raster, the result would be a 'visualization' of the embedded (sign) problem. And I imagine you've read every lidr bugs report, but this one is interesting [issues 227](https://github.com/r-lidar/lidR/issues/227), not because it is 'your' issue per se, but within it is a request for users to bring forward edge cases that are hard to simulate, are being tested for, and yours may be such a case, #sensors?? – Chris Oct 28 '22 at 13:23
  • crs would be useful here. – Chris Oct 28 '22 at 16:06

1 Answers1

1

It may be that you don't want this as two layers in one raster, though this is hard to tell with the supplied data as it appears to be all 'within' the overlap. I notice in you package, there is an attempt to throttle/reduce tile edge points that maybe just needs to be set lower than 1K.

That terra doesn't work the same as raster when rasterize(ing may be a decision that under terra one should intend two layers via making each then add<-ing or <- c(ing, whereas with raster it was assumed via a hard to follow logic of 'field' and 'values'. Using your above data (and keeping two rasters):

library(terra) 
#las_df <- structure(...)
las_df1 <- las_df[which(las_df$flightlineID == 1L), ]
las_df2 <- las_df[which(las_df$flightlineID == 2L), ]
las_vect1 <- vect(las_df1, geom = c('X', 'Y'), crs = 'EPSG:32755')
las_vect2 <- vect(las_df2, geom = c('X', 'Y'), crs = 'EPSG:32755')
las_rast <- rast(xmin=0, nrow = length(unique(las_df$X)), ncol = length(unique(las_df$Y)), crs='EPSG:32755')
set.ext(las_rast, c(min(las_df$X), max(las_df$X), min(las_df$Y), max(las_df$Y)))
pts1_rast <- rasterize(las_vect1, las_rast, fun = length)
pts2_rast <- rasterize(las_vect2, las_rast, fun = length)
pts1_pts2_rast <- c(pts1_rast, pts2_rast)
names(pts1_pts2_rast) <- c('lyr.1', 'lyr.2') # have to attend to this as both lyr.1 after `c(`
plot(pts1_pts2_rast$lyr.1, col = 'red')
plot(pts1_pts2_rast$lyr.2, col = 'blue', alpha=.75, add = TRUE)
# there is 1 cell that contains points from both pts1_rast and pts2_rast
 cells(pts1_rast) %in% cells(pts2_rast)
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 cells(pts2_rast) %in% cells(pts1_rast)
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE

One might suggest a consistent merge policy where pts1 or pts2 are always favored. In the end, if this is about optimizing allocation of scarce resources, clear bush where you have the best data, inspect, and clear again. But it still seems best to resolve this at the las level upstream.

Chris
  • 1,647
  • 1
  • 18
  • 25
  • Thanks @Chris - I really appreciate your detailed feedback. I'll mark this as the accepted answer. I might also raise an issue at the terra repo, if only to update the docs for the rasterize function. On resolving the LAS problems upstream - trust me, we tried :) Newer data in our part of the world do not have these artefacts but it seems that we're stuck with them in the legacy data. – michael Oct 31 '22 at 03:20