1

I have an excel file containing two columns. The first column is "City" and the second column is "Country". I want my python code to loop through each row and find the latitude and longitude for each row. The python code creates two new columns "Latitude" and "Longitude" as desired but is returning None for all values. Any help is appreciated.

import pandas as pd
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="Your_Name")

df = pd.read_excel("location.xlsx")

longitude = []
latitude = []

city = df["City"]
country = df["Country"]

for i in df.index:

    loc = geolocator.geocode(city + ',' + country)

    if loc is None:
        latitude.append(None)
        longitude.append(None)
    else:
        latitude.append(loc.latitude)
        longitude.append(loc.longitude)

df["Longitude"] = longitude
df["Latitude"] = latitude

print(df)

2 Answers2

1

With the given dataframe df below :

import pandas as pd

data = {"City": ["London", "Berlin", "Madrid", "Rome", 
                   "Paris", "Vienna", "Bucharest", "Hamburg", 
                   "Budapest", "Warsaw", "Barcelona", 
                   "Munich", "Milan"],
        "Country": ["England", "Germany", "Spain", "Italy",
                      "France", "Austria", "Romania", 
                      "Germany", "Hungary", "Poland", "Spain",
                      "Germany", "Italy"]
       }

df = pd.DataFrame(data)

You can get the lat/long of the couple City/Country by using geolocator.geocode :

from geopy.geocoders import Nominatim

out = (
        df.assign(Geocodes= (df['City'] + ", " + df['Country']).apply(geolocator.geocode))
          .assign(Latitude = lambda x: [g.latitude for g in x['Geocodes']],
                  Longitude= lambda x: [g.longitude for g in x['Geocodes']])
          .drop(columns='Geocodes')
      )

# Output :

print(out)

         City  Country   Latitude  Longitude
0      London  England  51.507322  -0.127647
1      Berlin  Germany  52.517037  13.388860
2      Madrid    Spain  40.416705  -3.703582
3        Rome    Italy  41.893320  12.482932
4       Paris   France  48.858890   2.320041
5      Vienna  Austria  48.208354  16.372504
6   Bucharest  Romania  44.436141  26.102720
7     Hamburg  Germany  53.550341  10.000654
8    Budapest  Hungary  47.497994  19.040359
9      Warsaw   Poland  52.231958  21.006725
10  Barcelona    Spain  41.382894   2.177432
11     Munich  Germany  48.137108  11.575382
12      Milan    Italy  45.464194   9.189635
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • Thanks so much! I tested this out and it works for the most part but let's say if a city or country name is invalid how can I have this code fill a value of NA or None. – user11014777 Sep 29 '22 at 00:12
  • Can you show and/or share a sample of your dataset `"location.xlsx"` where the names are invalid ? – Timeless Sep 29 '22 at 00:15
  • I manually typed in "asdf" for both country and city because in the actual data what if there is a typo in a city or country name so I want to make sure the code doesn't break. So let's say this: data = {"City": ["asdf"], "Country": ["asdf"] } – user11014777 Sep 29 '22 at 00:21
0

You missed indexing the series city and country.

Just try loc = geolocator.geocode(city[i] + ',' + country[i]) in your code. It's working for me.

Dr Nisha Arora
  • 632
  • 1
  • 10
  • 23