I have been trying to polygonize a raster in R and ran into memory issues. Searching for solutions, I found this question with a routine using the stars
package in R that I felt could solve my issues. I provide a reproducible example below that should be fast to run (I need to polygonize a raster that is actually much larger than this one).
Basically, my issue is that I have a raster in which all gridpoints have the same values. I want to polygonize it to a single multipolygon (like what I would get with rasterToPolygons(data, dissolve = TRUE)
). To do so, I tried converting my raster to a star object (stars::st_as_stars(data)
) and then convert to a sf object with the option for merging cells with the same value (st_as_sf(data, merge = TRUE)
). The option merge = TRUE
is available through the stars package, not the sf one, more details are available here. This is merging some cells, but not all of them and I don't know why. I also don't know what to do to fix this issue.
library(CropScapeR)
library(httr)
library(stars)
library(raster)
library(sf)
#THIS IS JUST TO GET THE DATA I AM WORKING WITH, THE CROPLAND DATA LAYER BY USDA
#To download CDL data on Linux using CropscapeR i need to do the below process as per
#the instructions in the package page (in Windows you can basically go directly to)
#GetCDLData(.) part
# Skip the SSL check
httr::set_config(httr::config(ssl_verifypeer = 0L))
# Automatically generate a temporary path to save the data
tif_file <- tempfile(fileext = '.tif')
ST_CDL1 <- GetCDLData(aoi = '34001', year = 2021, type = 'f', save_path = tif_file)
writeRaster(ST_CDL1, "ST_CDL1.tif", overwrite=TRUE)
ST_CDL1 = raster("ST_CDL1.tif")
#NOW IS WHERE MY QUESTION ACTUALLY BEGINS
#I create a new raster only with the cells in which I have blueberries.
#In this raster all cells should have the same value: 242
ST_CDL_blueberries1 <-
raster::reclassify(ST_CDL1,
c(-0.1,241.9,NA,
242.1,255.1,NA),
progress="text")
#My idea here is to convert the raster to a star object and then use
#st_as_sf( , merge = TRUE) to get an sf object with a multipolygon for all
#all blueberry fields. As mentioned above, people claimed elsewhere that this
#is much faster than doing what I was trying before (more details on that below)
ST_CDL_blueberries1 <- stars::st_as_stars(ST_CDL_blueberries1)
#This code should not merge the blueberry fields, it yields 52260 observations
#Notice I run sf::st_as_sf, not stars::st_as_sf, even though as I said above I am
#the merge option which comes from the stars package. However, for some reason when
#I try to run stars::st_as_sf the code does not work.
ST_CDL_blueberries1false <- sf::st_as_sf(ST_CDL_blueberries1,
as_points = FALSE, merge = FALSE)
#This code should merge the blueberry fields, it yields 5908 observations, which
#is less than before, but I was hoping to really get a single observation
ST_CDL_blueberries1true <- sf::st_as_sf(ST_CDL_blueberries1,
as_points = FALSE, merge = TRUE)
If you want an example of something that actually works, but that was crashing due to excessive ram requirements when I tried to run it in a larger raster, here it is:
#Starting from the reclassify step in the above code
ST_CDL_blueberries1 <-
raster::reclassify(ST_CDL,
c(-0.1,241.9,NA,
242.1,255.1,NA),
progress="text")
#Poligonyzing the rasters and then making them sf objects
ST_CDL_blueberries1 <- rasterToPolygons(ST_CDL_blueberries1, dissolve = TRUE)
ST_CDL_blueberries1 <- st_as_sf(ST_CDL_blueberries1)
My specs:
- Ubuntu 22.04.1 LTS
- R 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
- RStudio 2022.07.2+576 "Spotted Wakerobin" Release (e7373ef832b49b2a9b88162cfe7eac5f22c40b34, 2022-09-06) for Ubuntu Jammy Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36
- CropScapeR 1.1.4
- httr 1.4.4
- raster 3.6-3
- sf 1.0-8
- stars 0.5-6
Thanks for your help! :)
As said in comments in the code, I have compared the results under st_as_sf(data, merge = TRUE)
and st_as_sf(data, merge = FALSE)
. When merge = TRUE
I indeed get less observations, but since I know my dataset has the same values for all cells, I should be getting only one observation, that's where my issue lies.