-1

I have a big data frame (832k rows) with latitude and longitude in a gridded format plus one variable. I would like to plot the average of this variable per county. The problem is that I do not have the identification of county or state by point, only the coordinates.

Sorry, I am not sure how to include a replicable example

Felipe Dalla Lana
  • 615
  • 1
  • 5
  • 12
  • 2
    `dput(head(your_data))` is a great way to share a reproducible example. Or, if you have factors, `dput(droplevels(head(your_data)))`. – Gregor Thomas Apr 18 '19 at 17:03
  • You could use the `rworldmap` package to extract the country name based on the lat/long and then use `dplyr::group_by` to `summarise` for the average of that variable – Sonny Apr 18 '19 at 17:07
  • 2
    Also, a google search for [get county from lat long](https://www.google.com/search?client=firefox-b-1-d&q=get+county+from+lat+long) has a lot of useful looking links, including quite a few Q/As from SO [like this one](https://stackoverflow.com/q/5864601/903061) and GIS stack exhcange [like this one](https://gis.stackexchange.com/q/77048/4108). Restricting to the R tag, [this looks helpful](https://stackoverflow.com/q/31544270/903061). Have you tried any of these? – Gregor Thomas Apr 18 '19 at 17:09

2 Answers2

1

Two approaches:

1) Calculate average of all the lat/lon grids. This approach skews your county centre towards higher density grids

2) Calculate bounds[min-max lat/lon] of grids and average the bounds. This approach places the county centre in exactly centre of the grid span.

Surabhi Mundra
  • 377
  • 1
  • 12
0

You will need to obtain the county (or state) data and then spatially join it with your dataframe. One possible source for such data is the TIGER shapefile published by the U.S. Census (see e.g. https://catalog.data.gov/dataset/tiger-line-shapefile-2016-nation-u-s-current-county-and-equivalent-national-shapefile).

You can then use the sf package to read the shapefile into R, join it with your data, and then use regular summary functions to summarize your data by county.

library(sf)

filename <- 'https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/tl_2016_us_county.zip'
tmpfile <- tempfile()
tmpdir <- tempdir()
download.file(filename,tmpfile)
unzip(zipfile = tmpfile, exdir = tmpdir)
county_data <- st_read(paste0(tmpdir, '/tl_2016_us_county.shp'))
unlink(tmpfile)
unlink(tmpdir)
yeedle
  • 4,918
  • 1
  • 22
  • 22