I'm having trouble with one of the maps I'm making for a report. I know the problem, but don't know how to fix it.
Four polygons in my shape file are not valid--they are Ring Self-Intersecting. The shape file is for the New York metro area. When I open it in QGIS, the four problematic polygons are valid. It's only when I open them in R using ggplot
or tmap
the I have a problem.
Here's my code:
# remove all objects in R and clear R
remove(list=ls())
# Load packages
library(tidyverse) # tidy verse package for data maninpulation
library(sp)
library(sf) # sf package to deal with shapefiles and perform spaital operations
library(rnaturalearth)
library(tmap) # tmap package for mapping
library(tmaptools)
library(rmapshaper)
library(haven) # read Stata data
library(leaflet) # leaflet
library(htmlwidgets) # htmlwidgets
library(ggplot2)
# load shapefile
# ==============
shapefile <- read_sf("/Users/Dora/Library/Mobile Documents/com~apple~CloudDocs/Projects/Latino Data Project/Data/Shapefiles/ipums_cpuma0010/ipums_cpuma0010.shp")
# check class, should be sf
class(shapefile)
glimpse(shapefile)
# filter out only States we want : New York, New Jersey, Conn.
shapefile <- shapefile %>%
filter(State %in% c("New York", "New Jersey", "Connecticut"))
# check that only 3 states have been filtered
unique(shapefile$State)
# test to see if this works with ggplot, it does not, geometry is invalid
# ggplot() + geom_sf(data = shapefile)
puma.preview <- tm_shape(shapefile) +
tm_borders()
puma.preview
A quick summary of what I'm doing: I'm loading the shape file (which is for the entire country) and cut it down to a few states in the metro area us the sf
package. I then merge it with my data to further cut it down to the PUMAs (geo-area) that I want for my final map. But when I try to plot the boundaries as in the last step, it get a warning message:
Warning message: The shape shapefile is invalid. See sf::st_is_valid
I found this discussion on r-spatial that helps to identify invalid geometries. I ran the the following code.
#################################
# OPTION 1: Make shapefile valid
#################################
# which geometries are invalid and why?
st_is_valid(shapefile, reason = TRUE)
# items 109, 110, 128, 132 are not valid
# true to make valid
shapefile.fix <- st_make_valid(shapefile)
# test to see if shape is valid
st_is_valid(shapefile.fix, reason = TRUE)
and determined that four geometries were invalid because they had:
"Ring Self-intersection
[109] "Ring Self-intersection[1844806.7743 586468.577099999]"
[110] "Ring Self-intersection[1856751.9054 600658.621400001]"
[128] "Ring Self-intersection[1885193.1836 576520.4439]"
[132] ""Ring Self-intersection[1837919.417 592619.7127]"
So, I tried to fix them with st_make_valid()
, and while that made the error message go away, the geometry is still missing on the final maps. The area circled in red should be filled in.
Ideally, I'd be able to repair the geometry so that I can map these polygons. Does anyone have any ideas about how I can do this or what else might be the problem? I re-downloaded the shape file and that didn't matter. I can't really get the shape file from an alternate source--so that isn't an option. Using QGIS isn't an option either because I've already made most of the 20 maps I need. I recently upgraded my R (4.0.0) and RStudio (Version 1.2.5042) and didn't notice this problem before then. Could that be the source of the problem--that there's some bug in how tmap
and the newest version of RStudio work together? I can't think of anything else.
Any help would be greatly appreciated as I gotta submit this project in the next week or so. Thanks.