0

I have an array of list of cities. I want to group them by the country name. Is there any library I can install which will do that ?

e.g array(['Los Angeles', 'Detroit', 'Seattle', 'Atlanta', 'Santiago', 'Pittsburgh', 'Seoul', 'Santa Clara', 'Austin', 'Chicago'])

I want to know the country they belong to and add a new country column in my dataframe.

Dhiraj D
  • 63
  • 8
  • 2
    Probably not. When you see 'Rome', does it mean Rome, New York, or Rome, Italy? When you see 'Venice', does it mean Venice, California or Venice, Italy? Paris, Texas or Paris, France? – ifly6 Jul 19 '21 at 16:21
  • I don't think there is a reliable way to do this just with a city name. Cities in different countries can have the same name. Even within one country there can be many cities with the same name... – mozway Jul 19 '21 at 16:23
  • https://stackoverflow.com/questions/7066825/is-there-an-iso-standard-for-city-identification no ISO scheme, a few other candidate schemes you can consider – Rob Raymond Jul 19 '21 at 17:10

1 Answers1

0

I agree with what has been said in the comments - there is no clear way to join a city to a country when city names are not unique.

For example if we run...

import pandas as pd

df = pd.read_csv('https://datahub.io/core/world-cities/r/world-cities.csv')

df.rename(columns ={"name":"city"}, inplace=True)

print(df)

Outputs:

enter image description here

# create a list of city names for testing...
myCityList = ['Los Angeles', 'Detroit', 'Seattle', 'Atlanta', 'Santiago', 'Pittsburgh', 'Seoul', 'Santa Clara', 'Austin', 'Chicago']

# pull out all the rows matching a city in the test list..
df.query(f'city=={myCityList}')

Outputs:

enter image description here

However something is wrong because there are more rows listed than items in the test city list (and clearly Santiago is listed multiple times)...

print(len(myCityList))
print(df.query(f'city=={myCityList}').shape[0])

Outputs:

10
15

Maybe the above is useful but it has to be used with caution as it's not 100% guaranteed to output the correct country for a given city.

MDR
  • 2,610
  • 1
  • 8
  • 18