R sf st-intersection keep unmatched rows

Question

I'm trying to create a map that shows the coverage of proprietary data (shapefile). Basically, the outcome will show how much of the contiguous USA is covered by the data. I used st_intersection, but it only keeps "matched" rows. Is there a way I can keep the "unmatched" rows as well so that I can indicate which area is "NA"?

Here's a reproducible example. Suppose I have a subset of zip code maps in DC area ("zip") and the rest of the DC area zip code map is not available (of course this is not true in reality, but it's the problem I face with actual data). When I use dc_zip, only the matched rows are plotted. What I want instead is (partly) achieved by plotting the entire polygon of DC first and plotting the zip code on top of it. Here, I can split up DC into the area with zip code map created and zip code map not drawn. Is this the best approach?

library(USAboundaries)
library(sf)
library(dplyr)
library(tigris)
library(tmap)

dc <- USAboundaries::us_states() %>% filter(statefp == "11") %>% st_transform(crs = 4269)

zip <- tigris::zctas(starts_with = "2000", class = "sf")

dc_zip <- st_intersection(zip, dc)

tm_shape(dc_zip) + tm_polygons() # drops unmatched rows

tm_shape(dc) + tm_polygons("grey80") + tm_shape(zip) + tm_polygons("green") # keeps unmatched rows

Any comments are appreciated!

Your code is not reproducible. What is `us_states()`? And which package can you get it? — www, Feb 02 '20 at 04:19
Are you saying that you want to draw all polygons in the DC area and fill in some of the polygons with colors? — jazzurro, Feb 02 '20 at 11:06
I edited my question. Hopefully, it's more clear now! Thanks — qnp1521, Feb 02 '20 at 15:57

score 2 · Answer 1 · answered Feb 02 '20 at 12:21

I am not completely sure what you are trying to do. But seeing what you were doing, it seems to me that you want to draw all polygons in the DC area and assign colors for some of the polygons. If so, the following is one way for you. I tried to stick to the packages that you used except the tmap package.

library(tigris)
library(USAboundaries)
library(sf)
library(dplyr)
library(ggplot2)

# Get DC polygon
dc <- USAboundaries::us_states() %>%
      filter(statefp == "11") %>%
      st_transform(crs = 4326)

Seeing this web page, you probably need to specify starts_with = "20" to include all polygons in the DC area. But this is not enough. So you need to subset the data.

zip <- tigris::zctas(starts_with = "20", class = "sf")

zip2 <- mutate(zip, ZCTA5CE10 = as.numeric(ZCTA5CE10)) %>% 
        filter(ZCTA5CE10 <= 20600) %>% 
        st_transform(zip, crs = 4326)

# Get polygons in the DC area
dc_zip <- st_intersection(zip2, dc)

Let's draw DC area once

ggplot() + geom_sf(data = dc_zip)

Let's say you have a continuous variable. Values are present for 20000-20010 (ZCTA5CE10), but all other ZCTA5CE10 have NA. I create this dummy variable here. In your real data, I think this is the variable you need to specify by yourself.

set.seed(111)
dc_zip <- mutate(dc_zip, whatever = if_else(ZCTA5CE10 %in% 20000:20010,
                                            sample.int(1000, size = n(), replace = FALSE),
                                            NA_integer_))

Draw a map again. I use whatever as a variable to fill in polygons.

ggplot() + geom_sf(data = dc_zip, aes(fill = whatever))

If you want to change the color for NA areas, you can add scale_fill_continuous(na.value = "white"), for instance.

Thank you for your answer in detail and my apologies if my question was unclear. The key difficulty in my (real) case is that I have only a subset of data, unlike the DC zip code example. Put it differently, it's as if only part of the DC area (20000 to 20010) has zip codes map drawn, and I want to create a map showing how much of DC area zip codes are still NA. So I cannot do the zip2 and mutate trick you showed. Do you have any further suggestions in this case? — qnp1521, Feb 02 '20 at 15:52
@qnp1521 if you have the subset polygons like these 11 polygons (20000-20010) only, how would you know how many more polygons you need in that region? As you tried, if you have the dc polygon and the subset polygons, I think your approach will work. You cannot tell how many zip code areas are missing, but they all appear in one specific color. — jazzurro, Feb 02 '20 at 23:01
@akrun if i was any of help for you to learn something, that’s my pleasure. You’ve been helping me and others to learn a lot about data manipulation here. Thanks for sharing your knowledge. — jazzurro, Feb 02 '20 at 23:05
I learn a lot of things from other contributors. SO is a great place — akrun, Feb 02 '20 at 23:09
@jazzurro Exactly! That's why I wanted to keep the part of DC area not matched with the subset of zip codes, but it seems like what I've tried (overlaying one on top of the other) is the best I can get. Thank you, I learned a lot as well! — qnp1521, Feb 03 '20 at 15:41

R sf st-intersection keep unmatched rows

1 Answers1