I have a dataframe with about 100k rows and I would like to find the coordinates based on the country short code with geopy.
The original dataframe look like this:
index | country | city |
---|---|---|
0 | CL | SANTIAGO |
1 | BR | SAO PAULO |
2 | BR | ITUPEVA |
To limit the number of requests I create another dataframe with unique "country" value, then look up location details from Geopy with following code:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
country_list = raw_df.country.unique().tolist()
short_code_df["location"] = short_code_df["short_code"].apply(geocode,language="en")
Which return me the below df:
| index | short_code | location |
|-------|------------|----------------------------------------------|
| 0 | CL | (Chile, (-31.7613365, -71.3187697)) |
| 1 | BR | (Brazil, (-10.3333333, -53.2)) |
| 2 | US | (United States, (39.7837304, -100.445882)) |
| 3 | GB | (United Kingdom, (54.7023545, -3.2765753)) |
| 4 | JP | (Japan, (36.5748441, 139.2394179)) |
| 5 | CH | (Switzerland, (46.7985624, 8.2319736)) |
| 6 | CN | (China, (35.000074, 104.999927)) |
| 7 | HK | (Hong Kong, China, (22.350627, 114.1849161)) |
What I would like is to return the Country, Lat, Lon in their respective column like below:
| index | short_code | location | lat | lon |
|-------|------------|----------------------|----------------|----------------|
| 0 | CL | Chile | -31.7613365 | - 71.3187697 |
| 1 | BR | Brazil | -10.3333333 | -53.2 |
| 2 | US | United States | 39.7837304 | 100.445882 |
I tried to slice it with a square bracket at the end but it throw me an error.
I also tried to use the pandas explode function but that was not working too.
Thanks first and appreciate your helps.