Exporting QGIS Shapefiles into R: How to ensure polygon types are saved as factors and not levels?

Question

Deleted my previous question as I realized this is the crux of the issue I am having. I am using the siland package in R to create the optimal buffer size where effect size is greatest from my observation points for each of 7 landcover variables I have. These landcover variables were created manually as polygons in QGIS and then merged into a single layer (in script: Trial-2) prior to exporting to be used in R. My problem is this, when I read the data into R, my polygon types (e.g. agricultural, anthropogenic) are being considered as levels of the factor "layer" rather than factors in and of themselves. I need agricultural land cover to be a factor so that I can calculate how much area is in a buffer and not for it to be used to assign whether a polygon is of a certain type of not. Any help on what to do about this would be super appreciated!

shapedata=st_read(dsn = "R/GIS transfer/", layer = "Trial-2", stringsAsFactors = T) 
#Simple feature collection with 7 features and 1 field
#Geometry type: MULTIPOLYGON
#Dimension:     XY
#Bounding box:  xmin: 442227.6 ymin: 5424196 xmax: 446567.3 ymax: 5428756
#Projected CRS: ETRS89 / UTM zone 32N

str(shapedata)
#Classes ‘sf’ and 'data.frame': 7 obs. of  2 variables:
#$ layer   : Factor w/ 7 levels "Agri T","Anthro T",..: 1 2 3 4 5 6 7
#$ geometry:sfc_MULTIPOLYGON of length 7; first list element: List of 195

EDIT: I am following along with the siland vignette - the end product of which is to create a buffer where the variable is most related to the observation (e.g. 259m for agricultural landcover, 23m for anthropological etc.) (https://cran.r-project.org/web/packages/siland/vignettes/siland.html).

My code is this:

shapedata=st_read(dsn = "R/GIS transfer/", layer = "Trial-2",) 
#Simple feature collection with 7 features and 1 field
#Geometry type: MULTIPOLYGON
trapdata<-read.table("Trap-Data-PA.csv",header=T,sep=",")
> str(shapedata)
#Classes ‘sf’ and 'data.frame': 7 obs. of  2 variables:
#$ layer   : Factor w/ 7 levels "Agri T","Anthro T",..: 1 2 3 4 5 6 7
#$ geometry:sfc_MULTIPOLYGON of length 7; first list element: List of 195

The next step was to plot, which I succeeded in doing by creating an object for polygons of each level

Agri=st_geometry(shapedata[shapedata$layer == "Agri T",]) #extract an sf object with only polygons of type Agri T 
Anthro=st_geometry(shapedata[shapedata$layer == "Anthro T",]) #extract an sf object with only polygons of type Anthro T 
p<-ggplot(shapedata)+
  geom_sf(data=Agri,fill="red")+
  geom_sf(data=Anthro,fill="blue")+
  geom_point(data=trapdata, aes(x,y),col="green")
 p + coord_sf(xlim = c(8.228361,8.249213),   ylim = c(48.99159,48.99941))

What's tripping me up, however, is inputting my data into the siland function itself:

resB1=Bsiland(obs~x1+L1+L2,land=shapedata,data=trapdata)#bisiland

I do not have the equivalent of L1 or L2 which in the vignette are the variables used for landcover. You can see in their str(shapedata) that they have:

str(landSiland)
## Classes 'sf' and 'data.frame':   4884 obs. of  3 variables:
##  $ L1      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ L2      : num  0 1 0 0 0 0 0 0 1 0 ...
##  $ geometry:sfc_MULTIPOLYGON of length 4884; first list element: List of 1

My variables don't appear to be considered as classes the same way which is likely why when I try to input my variables into the bsiland function it returns the following error message: "colnames X and Y for observations are not available in data argument"

Can you explain more clearly what the problem is? When you import a shapefile using `st_read` character attributes are, by default, imported as factors. If you prefer to keep the imported attribute column as characters, then use the `stringsAsFactors = FALSE` parameter to st_read. But I don't see what is the connection to calculating area of buffers. — Micha, Mar 22 '22 at 19:48
Hi Micha, thanks for your response. I have edited the original question with more information. I likely don't yet know the correct language to describe my problem hence the confusion — Lara13, Mar 22 '22 at 23:10

score 0 · Answer 1 · answered Mar 23 '22 at 21:13

Now I see the issue. The siland package apparently expects the landcover polygons to be "one hot encoded". This means that instead of one landcover class column (as is usual), the data needs to be preprocessed such that there are multiple columns, one for each landcover class, and each column contains values of 1 or 0, where 1 indicates that the feature is of that landcover, and 0 means not that landcover.

One of the ways to do this is R is with the dummyVars function in the caret package. Here's how it would work:

library(sf)
library(caret)
library(siland)

landcover <- st_read("landcover.shp")
# Remove geometry
landcover_df <- st_drop_geometry(landcover)

# Prepare one hot encoded data.frame
landcover_onehot <- dummyVars("~.", data=landcover_df)
landcover_encoded <- data.frame(predict(landcover_onehot, newdata = landcover_df))
str(landcover_encoded)

# Join back to original 
landcover <- do.call(cbind, list(landcover, landcover_encoded)) 
str(landcover)

Hope that helps you to get back on track.

Exporting QGIS Shapefiles into R: How to ensure polygon types are saved as factors and not levels?

1 Answers1