0

I'm having trouble with one of the maps I'm making for a report. I know the problem, but don't know how to fix it.

Four polygons in my shape file are not valid--they are Ring Self-Intersecting. The shape file is for the New York metro area. When I open it in QGIS, the four problematic polygons are valid. It's only when I open them in R using ggplot or tmap the I have a problem.

Here's my code:

# remove all objects in R and clear R
remove(list=ls())

# Load packages
library(tidyverse) # tidy verse package for data maninpulation 
library(sp)
library(sf) # sf package to deal with shapefiles and perform spaital operations
library(rnaturalearth)
library(tmap) # tmap package for mapping
library(tmaptools)
library(rmapshaper)
library(haven) # read Stata data
library(leaflet) # leaflet
library(htmlwidgets) # htmlwidgets
library(ggplot2)

# load shapefile
# ==============
shapefile <- read_sf("/Users/Dora/Library/Mobile Documents/com~apple~CloudDocs/Projects/Latino Data Project/Data/Shapefiles/ipums_cpuma0010/ipums_cpuma0010.shp")

# check class, should be sf
class(shapefile)
glimpse(shapefile)

# filter out only States we want : New York, New Jersey, Conn. 
shapefile <- shapefile %>% 
  filter(State %in% c("New York", "New Jersey", "Connecticut"))

# check that only 3 states have been filtered
unique(shapefile$State)


# test to see if this works with ggplot, it does not, geometry is invalid
 # ggplot() + geom_sf(data = shapefile)

  puma.preview <- tm_shape(shapefile) + 
        tm_borders() 

  puma.preview

A quick summary of what I'm doing: I'm loading the shape file (which is for the entire country) and cut it down to a few states in the metro area us the sf package. I then merge it with my data to further cut it down to the PUMAs (geo-area) that I want for my final map. But when I try to plot the boundaries as in the last step, it get a warning message:

Warning message: The shape shapefile is invalid. See sf::st_is_valid

I found this discussion on r-spatial that helps to identify invalid geometries. I ran the the following code.

#################################
# OPTION 1: Make shapefile valid
################################# 
# which geometries are invalid and why?
st_is_valid(shapefile, reason = TRUE)

  # items 109, 110, 128, 132 are not valid

  # true to make valid
  shapefile.fix <- st_make_valid(shapefile)

      # test to see if shape is valid
      st_is_valid(shapefile.fix, reason = TRUE)

and determined that four geometries were invalid because they had:

"Ring Self-intersection

[109] "Ring Self-intersection[1844806.7743 586468.577099999]"

[110] "Ring Self-intersection[1856751.9054 600658.621400001]"

[128] "Ring Self-intersection[1885193.1836 576520.4439]"

[132] ""Ring Self-intersection[1837919.417 592619.7127]"

So, I tried to fix them with st_make_valid(), and while that made the error message go away, the geometry is still missing on the final maps. The area circled in red should be filled in.

enter image description here

Ideally, I'd be able to repair the geometry so that I can map these polygons. Does anyone have any ideas about how I can do this or what else might be the problem? I re-downloaded the shape file and that didn't matter. I can't really get the shape file from an alternate source--so that isn't an option. Using QGIS isn't an option either because I've already made most of the 20 maps I need. I recently upgraded my R (4.0.0) and RStudio (Version 1.2.5042) and didn't notice this problem before then. Could that be the source of the problem--that there's some bug in how tmap and the newest version of RStudio work together? I can't think of anything else.

Any help would be greatly appreciated as I gotta submit this project in the next week or so. Thanks.

kaseyzapatka
  • 149
  • 2
  • 9

1 Answers1

0

If your datset contains ID of the PUMAs you can consider dropping your invalid geometry (while keeping the data part, such as the rent shown in your map) and joining it to a valid geometry; this can be got easily via the {tigris} package.

library(tigris)
library(sf)
library(dplyr)
library(tmap)

# download PUMA 
big_cats <- tigris::pumas("New York", class = "sf") %>% 
  rbind(tigris::pumas("New Jersey", class = "sf")) %>% 
  rbind(tigris::pumas("Connecticut", class = "sf"))

# some sort of dplyr::*_join to your data here...

tm_shape(big_cats) + tm_polygons()

enter image description here

Jindra Lacko
  • 7,814
  • 3
  • 22
  • 44
  • Thanks Jindra. I had thought of that but wasn't sure how I could do that since the same geometeries are invalid using either tmap and ggplot. I was hoping the error was something in my code or with the newest version of tmap. Also, I didn't note this in the original post, but I'm using consistent PUMAs, which is a special shapefile, since PUMAs change over time. Thanks for your suggestion. – kaseyzapatka May 15 '20 at 21:39
  • Oki, if your PUMAs are not the US Census PUMAs then my suggestion is no longer valid. Getting the official Public Use Microdata Areas shapes right & join the valid geometry back to your data seemed as an option, but I can not comment on this special shapefile. – Jindra Lacko May 16 '20 at 06:34
  • Yes, they are US Census PUMAs, but they are special shape files called consistent PUMAs [(cpuma0010)](https://usa.ipums.org/usa-action/variables/CPUMA0010#description_section) which you can access [here](https://usa.ipums.org/usa/volii/cpuma0010.shtml). I'm just not sure why tmap is reading these as not valid geometries. Thanks again for your suggestions. – kaseyzapatka May 16 '20 at 17:00