0

For the U.S., I can get the centroids of all the counties in a state. However, upon closer inspection, the centroid of some counties is not correct. How can I manually correct the point geometry (latitude and longitude) for a given county?

Example of Virginia

library(dplyr) # data wrangling
library(sf) # for getting centroids
library(tigris) # for getting county shapefile
library(cdlTools) # converting state fips to state name
library(ggplot2) # for mapping

### Extract centroids for each county in the USA

centroids <- counties(resolution = "20m") %>% 
  st_centroid()
centroids$STATE_NAME <- fips(centroids$STATEFP, to = 'Name')
va_cnty <- centroids[centroids$STATE_NAME %in% 'Virginia',]

### Make map of VA showing location of county centroids ----

state_shp <- states(cb = T)
state_shp$STATEFP <- as.numeric(state_shp$STATEFP)
state_shp <- state_shp[state_shp$STATEFP %in% c(1,3:14,16:56),]

cnty_shp <- counties(cb = TRUE)
cnty_shp$STATEFP <- as.numeric(cnty_shp$STATEFP)
lower48_shp <- cnty_shp[cnty_shp$STATEFP %in% c(1,3:14,16:56),]
va_shp <- lower48_shp[lower48_shp$STUSPS == 'VA',]

ggplot() +
  geom_sf(data = state_shp,
          mapping = aes(geometry = geometry),
          color = 'black',
          fill = 'gray70') +
  geom_sf(data = va_shp,
          mapping = aes(geometry = geometry),
          color = 'black',
          fill = 'lightyellow') +
  geom_sf(data = va_cnty,
          mapping = aes(geometry = geometry),
          color = "black",
          fill = 'red',
          shape = 21,
          size = 3) +
  coord_sf(xlim = c(-83.5, -75), ylim = c(36.4, 39.5), expand = TRUE) +
  theme_bw() +
  theme(text = element_text(size = 16),
        axis.text.x = element_text(size = 14, color = "black"),
        axis.text.y = element_text(size = 14, color = "black"),
        panel.grid.major = element_blank(),
        panel.background = element_rect(fill = "lightblue"))

Map of the center lat and long of each county in Virginia enter image description here

How can I change the latitude and longitude in the geometry column of va_cnty? The correct latitude and longitude are: 37.772236, -75.660827

Two unsuccessful attempts

va_cnty[va_cnty$NAME %in% 'Accomack',7] <- st_point(c(-75.660827,37.772236))

# Error in `[<-.data.frame`(`*tmp*`, va_cnty$NAME %in% "Accomack", 7, value = c(-75.660827,: replacement has 2 rows, data has 1
sfc = st_sfc(st_point(c(-75.660827,37.772236)), crs = 'NAD83')
va_cnty[va_cnty$NAME %in% 'Accomack',7] <- sfc

# Warning message:
# In `[<-.data.frame`(`*tmp*`, va_cnty$NAME %in% "Accomack", 7, value = # list( : replacement element 1 has 2 rows to replace 1 rows
tassones
  • 891
  • 5
  • 18
  • 2
    Instead of fixing the few like that, you might want to try `st_centroid(of_largest_polygon = TRUE)` to get a point that 'looks' more like the centroid. The Accomack county centroid is likely pushed a bit west by the Tangier and other small islands in the Chesapeake. – mrhellmann May 09 '23 at 01:13
  • How do you **know** "the centroid of some counties is not correct"? What are you defining as "centroid"? Do you mean the centre of the largest land-mass in the polygon? Remember, a county can consist of many polygons (and therefore is likely a MULTIPOLYGON). – SymbolixAU May 09 '23 at 01:19
  • @SymbolixAU Regardless of what the ***correct*** centroid is, I still need to know how to change the lat & long of a sfc_POINT. – tassones May 09 '23 at 01:40
  • @mrhellmann I agree that Accomack is likely being pulled west by the small islands in Chesapeake. When I tried your suggestion it did not change the location of the centroid for Accomack, likely because the geometry is a POINT not a MULTIPOLYGON. – tassones May 09 '23 at 01:43
  • The st_centroids call I was referring to was here `centroids <- counties(resolution = "20m") %>% st_centroid(of_largest_polygon= TRUE)` where I hope the data would be POLYGON/MULTIPOLYGON. This should still return a centroid for all counties, based on the largest polygon. – mrhellmann May 09 '23 at 02:35

2 Answers2

2

It turns out that you're using two different shapefiles for Virginia counties. One (counties(resolution = "20m")) returns a polygon covering the entire county including the sea. The other (cnty_shp <- counties(cb = TRUE)) returns only the landmass of the county, at least in the examples with islands in the sea. This results in very different centroids.

In the first two plots notice the difference in polygons in the east. The first covers all land & sea. In the second, only the land is described by the (multi-) polygons

First with counties(resolution = '20m')

virginia_counties <- counties(state = 'VA', resolution = '20m') 
virginia_centroids <- st_centroid(virginia_counties)

ggplot() + 
  geom_sf(data = virginia_counties) + 
  geom_sf(data = virginia_centroids, color = 'red')

All centroids look right, as they're in the polygons, but some of those polygons cover non-land areas. enter image description here

Second with counties(cb = TRUE)

virginia_counties_2 <- counties(cb = TRUE, state = 'VA')
virginia_centroids_2 <- st_centroid(virginia_counties_2)

ggplot() + 
  geom_sf(data = virginia_counties_2) + 
  geom_sf(data = virginia_centroids_2, color = 'red')

enter image description here

And both centroids together: enter image description here

The red centroids are overplotted in counties that have landmass polygons equal (or nearly equal to) to a convex hull around landmass.

mrhellmann
  • 5,069
  • 11
  • 38
1

Geometry column in sf is a list-column, so you'd have to handle it like a list when working with individual elements:

library(sf)
# sf example dataset
nc <- st_read(system.file("shape/nc.shp", package="sf")) |>
  st_centroid() 
nc[1:5,"NAME"]
#> Simple feature collection with 5 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -81.49823 ymin: 36.40714 xmax: -76.02719 ymax: 36.49111
#> Geodetic CRS:  NAD27
#>          NAME                   geometry
#> 1        Ashe  POINT (-81.49823 36.4314)
#> 2   Alleghany POINT (-81.12513 36.49111)
#> 3       Surry POINT (-80.68573 36.41252)
#> 4   Currituck POINT (-76.02719 36.40714)
#> 5 Northampton POINT (-77.41046 36.42236)

# logical index 
nc$geometry[nc$NAME == "Ashe"] <- st_point(c(111,111))
# numeric index
nc$geometry[[2]] <- st_point(c(222,222))

# st_geometry method instead of $
st_geometry(nc)[[3]] <- st_point(c(333,333))

# replace just lon
nc$geometry[[4]][1] <- 123
# replace just lat
nc$geometry[[4]][2] <- 321

Result:

nc[1:5,"NAME"]
#> Simple feature collection with 5 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -77.41046 ymin: 36.42236 xmax: 333 ymax: 333
#> Geodetic CRS:  NAD27
#>          NAME                   geometry
#> 1        Ashe            POINT (111 111)
#> 2   Alleghany            POINT (222 222)
#> 3       Surry            POINT (333 333)
#> 4   Currituck            POINT (123 321)
#> 5 Northampton POINT (-77.41046 36.42236)

Created on 2023-05-09 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20