0

I have been trying to polygonize a raster in R and ran into memory issues. Searching for solutions, I found this question with a routine using the stars package in R that I felt could solve my issues. I provide a reproducible example below that should be fast to run (I need to polygonize a raster that is actually much larger than this one).

Basically, my issue is that I have a raster in which all gridpoints have the same values. I want to polygonize it to a single multipolygon (like what I would get with rasterToPolygons(data, dissolve = TRUE)). To do so, I tried converting my raster to a star object (stars::st_as_stars(data)) and then convert to a sf object with the option for merging cells with the same value (st_as_sf(data, merge = TRUE)). The option merge = TRUE is available through the stars package, not the sf one, more details are available here. This is merging some cells, but not all of them and I don't know why. I also don't know what to do to fix this issue.

library(CropScapeR)
library(httr)
library(stars)
library(raster)
library(sf)

   #THIS IS JUST TO GET THE DATA I AM WORKING WITH, THE CROPLAND DATA LAYER BY USDA
   #To download CDL data on Linux using CropscapeR i need to do the below process as per
   #the instructions in the package page (in Windows you can basically go directly to)
   #GetCDLData(.) part
   # Skip the SSL check
httr::set_config(httr::config(ssl_verifypeer = 0L))
   # Automatically generate a temporary path to save the data
tif_file <- tempfile(fileext = '.tif')
ST_CDL1 <- GetCDLData(aoi = '34001', year = 2021, type = 'f', save_path = tif_file)
writeRaster(ST_CDL1, "ST_CDL1.tif", overwrite=TRUE)
ST_CDL1 = raster("ST_CDL1.tif")

   #NOW IS WHERE MY QUESTION ACTUALLY BEGINS
   #I create a new raster only with the cells in which I have blueberries.
   #In this raster all cells should have the same value: 242
ST_CDL_blueberries1 <- 
  raster::reclassify(ST_CDL1,
                     c(-0.1,241.9,NA,
                       242.1,255.1,NA),
                     progress="text")

   #My idea here is to convert the raster to a star object and then use
   #st_as_sf( , merge = TRUE) to get an sf object with a multipolygon for all
   #all blueberry fields. As mentioned above, people claimed elsewhere that this
   #is much faster than doing what I was trying before (more details on that below)
ST_CDL_blueberries1 <- stars::st_as_stars(ST_CDL_blueberries1)
   #This code should not merge the blueberry fields, it yields 52260 observations
   #Notice I run sf::st_as_sf, not stars::st_as_sf, even though as I said above I am
   #the merge option which comes from the stars package. However, for some reason when
   #I try to run stars::st_as_sf the code does not work.
ST_CDL_blueberries1false <- sf::st_as_sf(ST_CDL_blueberries1, 
             as_points = FALSE, merge = FALSE)
   #This code should merge the blueberry fields, it yields 5908 observations, which
   #is less than before, but I was hoping to really get a single observation
ST_CDL_blueberries1true <- sf::st_as_sf(ST_CDL_blueberries1, 
             as_points = FALSE, merge = TRUE)

If you want an example of something that actually works, but that was crashing due to excessive ram requirements when I tried to run it in a larger raster, here it is:

#Starting from the reclassify step in the above code

ST_CDL_blueberries1 <- 
  raster::reclassify(ST_CDL,
                     c(-0.1,241.9,NA,
                       242.1,255.1,NA),
                     progress="text")


#Poligonyzing the rasters and then making them sf objects
ST_CDL_blueberries1 <- rasterToPolygons(ST_CDL_blueberries1, dissolve = TRUE)
ST_CDL_blueberries1 <- st_as_sf(ST_CDL_blueberries1)

My specs:

  • Ubuntu 22.04.1 LTS
  • R 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
  • RStudio 2022.07.2+576 "Spotted Wakerobin" Release (e7373ef832b49b2a9b88162cfe7eac5f22c40b34, 2022-09-06) for Ubuntu Jammy Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36
  • CropScapeR 1.1.4
  • httr 1.4.4
  • raster 3.6-3
  • sf 1.0-8
  • stars 0.5-6

Thanks for your help! :)

As said in comments in the code, I have compared the results under st_as_sf(data, merge = TRUE) and st_as_sf(data, merge = FALSE). When merge = TRUE I indeed get less observations, but since I know my dataset has the same values for all cells, I should be getting only one observation, that's where my issue lies.

RobertoAS
  • 45
  • 6

1 Answers1

0

This is not a direct answer to your question. But instead of raster::rasterAsPolygons, you could the more efficient terra::as.polygons ("terra" has replaced "raster"). Here is a simple example

library(terra)
f <- system.file("ex/elev.tif", package="terra")
r <- rast(f) |> classify(5)
p <- as.polygons(r)

plot(p, "elevation")

As you can see, the default setting is to dissolve all cells that have the same value into a single multi-polygon.

But you probably should not do this at all. Representing raster data as polygons generally is a terrible idea. Why are you doing this in the first place?

Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
  • Robert, Why is representing raster data as polygons a bad idea? My main goal is to intersect buffers around blueberry fields with forests and calculate the area of these buffers that contain forests. If I do so for all buffers I can plot a histogram of the areas intersection. I decided to polygonize the rasters because I want to intersect the buffers to a single polygon with all forests at once (and I can do that with st_intersection). My understanding is that if I were to use raster::intersect, I would get an intersection for each frid point with forests. – RobertoAS Nov 10 '22 at 19:22
  • Because the memory needs for polygons are much larger (polygons: 5 coordinate pairs for *each grid cell* (but fewer if cells get merged); raster: 4 coordinate pairs for the *entire dataset*. So you are asking for memory trouble. Processing can also be much more efficient for raster data because the geometry is known to be regular. You can use raster-based methods to find out how much forest there is around blueberry fields (but not with `intersect`). You could ask a question about that with some simple example data. – Robert Hijmans Nov 10 '22 at 19:43