2

I am using the codes below to identify US county. The data is taken from Yelp which provides lat/lon coordinate.

id latitude longitude
1 40.017544 -105.283348
2 45.588906 -122.593331
import pandas
df = pandas.read_json("/Users/yelp/yelp_academic_dataset_business.json", lines=True, encoding='utf-8')

# Identify county
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="http")
df['county'] = geolocator.reverse(df['latitude'],df['longitude'])

The error was "TypeError: reverse() takes 2 positional arguments but 3 were given".

2 Answers2

1

Nominatim.reverse takes coordinate pairs; the issue is that you are passing it pandas dataframe columns. df['latitude'] here refers to the entire column in your data, not just one value, and since geopy is independent of pandas, it doesn't support processing an entire column and instead just sees that the input isn't a valid number.

Instead, try looping through the rows:

county = []

for row in range(len(df)):
    county.append(geolocator.reverse((df['latitude'][row], df['longitude'][row])))

(Note the double brackets.)

Then, insert the column into the dataframe:

df.insert(index, 'county', county, True)

(index should be what column position you want, and the boolean value at the end indicates that duplicate values are allowed.)

hyper-neutrino
  • 5,272
  • 2
  • 29
  • 50
  • Thank you, I got the error "ValueError: Must be a coordinate pair or Point" when trying this. Do you have any suggestions? – Trinh Trong Anh Jul 07 '21 at 04:25
  • @TrinhTrongAnh Does `geolocator.reverse((df['latitude'], df['longitude']))` (note the double parantheses) work? – hyper-neutrino Jul 07 '21 at 04:30
  • Unfortunately, I go the same error: ValueError: Must be a coordinate pair or Point. – Trinh Trong Anh Jul 07 '21 at 04:37
  • @TrinhTrongAnh Weird. What about `geolocator.reverse(Point(latitude = df['latitude'], longitude = df['longitude']))`? (you may need to import `Point` from `geopy.point`). Apologies for the confusion; I've never used this module and I was following the documentation, but that appears to not work. – hyper-neutrino Jul 07 '21 at 04:50
  • The documentation also says you can pass a list or tuple of lat/long tuples. So you need triple parentheses. – Mark Ransom Jul 07 '21 at 04:52
  • I got new error when trying the Point approach: "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." – Trinh Trong Anh Jul 07 '21 at 05:03
  • Hi Mark, may you please explain further how the triple parentheses fit in the code? Thank you. – Trinh Trong Anh Jul 07 '21 at 05:04
  • `geolocator.reverse([(df['latitude'], df['longitude'])])` should work based on what @MarkRansom describes – hyper-neutrino Jul 07 '21 at 05:43
  • Unfortunately, the issue remains when I use triple parentheses : "ValueError: Must be a coordinate pair or Point" – Trinh Trong Anh Jul 07 '21 at 05:57
  • @TrinhTrongAnh Can you send me the JSON file as-is? It just occurred to me that I've never used panda and the issue might not be caused by misunderstanding the geopy documentation, it is probably something else. – hyper-neutrino Jul 07 '21 at 06:05
  • The data is provided by Yelp available at https://www.yelp.com/dataset or it can be downloaded here: https://www.dropbox.com/s/i2dns31ezwgmcbt/yelp_academic_dataset_business.json?dl=0. Thanks for your time. – Trinh Trong Anh Jul 07 '21 at 06:17
  • @TrinhTrongAnh Aha, I see my problem now. I found one issue but totally missed the other one. I've updated my answer; hope this helps! – hyper-neutrino Jul 07 '21 at 13:42
  • @hyper-neutrino Thanks for your time. I've created a new column called 'county' using `df.insert(2, "county", '')` However, I got "KeyError: 2". Do you have any idea? – Trinh Trong Anh Jul 07 '21 at 22:43
  • @TrinhTrongAnh Apparently you are supposed to insert the whole column at once, so I've added an updated version to my answer that should hopefully work. – hyper-neutrino Jul 07 '21 at 23:56
0

you could use the us census data, and geopandas.

imports

import urllib
import requests
from pathlib import Path
from zipfile import ZipFile
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
​

get geometry data as a geopandas dataframe

src = [
    {
        "name": "counties",
        "suffix": ".shp",
        "url": "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_5m.zip",
    },
]
data = {}
print('gathering county data from census')
for s in src:
    f = Path.cwd().joinpath(urllib.parse.urlparse(s["url"]).path.split("/")[-1])
    if not f.exists():
        r = requests.get(s["url"],stream=True,)
        with open(f, "wb") as fd:
            for chunk in r.iter_content(chunk_size=128): fd.write(chunk)
​
    fz = ZipFile(f)
    fz.extractall(f.parent.joinpath(f.stem))
​
    data[s["name"]] = gpd.read_file(
        f.parent.joinpath(f.stem).joinpath([f.filename
                                            for f in fz.infolist()
                                            if Path(f.filename).suffix == s["suffix"]][0])
    ).assign(source_name=s["name"])
gdf = pd.concat(data.values()).to_crs("EPSG:4326")
​

Lockport Illinois coordinates

query_point = Point(-88.057510, 41.589401)

use geopandas contains() to filter the data

contains = gdf.contains(query_point)
data = gdf[contains]
print(data['NAME'])

prints 'Will'

link to documentation: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.contains.html