7

I use Python 3 (I also have Python 2 installed) and I want to extract countries or cities from a short text. For example, text = "I live in Spain" or text = "United States (New York), United Kingdom (London)".

The answer for countries:

  1. Spain
  2. [United States, United Kingdom]

I tried to install geography but I am unable to run pip install geography. I get this error:

Collecting geography Could not find a version that satisfies the requirement geography (from versions: ) No matching distribution found for geography

It looks like geography only works with Python 2.

I also have geopandas, but I don't know how to extract the required info from text using geopandas.

joris
  • 133,120
  • 36
  • 247
  • 202
Markus
  • 3,562
  • 12
  • 48
  • 85
  • @smci The package is called `geograpy`, not `geography`. – Maximouse Apr 20 '20 at 17:24
  • @MaxiMouse: ok, then should this be closed as typo? Also, you could add that as answer. – smci Apr 20 '20 at 23:38
  • @smci Yes, it should probably be closed as a typo. I don't think this could be an answer. – Maximouse Apr 21 '20 at 07:50
  • @MaxiMouse: on reflection, the question asks the broader *"How to extract countries from a text?"*, isn't strictly tied to any package, and has good answers, so we should let it stand. – smci Apr 21 '20 at 08:20

2 Answers2

17

you could use pycountry for your task (it also works with python 3):

pip install pycountry

import pycountry
text = "United States (New York), United Kingdom (London)"
for country in pycountry.countries:
    if country.name in text:
        print(country.name)
TerryA
  • 58,805
  • 11
  • 114
  • 143
matyas
  • 2,696
  • 23
  • 29
  • 1
    Cool. But it will not work with abbreviations, right? Do you know something additional for recognizing abbreviations and passing them to country names? – Markus Feb 04 '18 at 11:25
  • For example, `BVI` -> `British Virgin Islands` – Markus Feb 04 '18 at 11:27
  • 1
    every country object has the attributes alpha_2 and alpha_3 which are abbreviations of the country. (E.g: Germany.alpha_2 = DE, Germany.alpha_3 = DEU) – matyas Feb 04 '18 at 11:31
  • 1
    I hope that that can cover your use case see also: https://pypi.python.org/pypi/pycountry – matyas Feb 04 '18 at 11:32
  • British Virgin Islands is in pycountry, but it's code is as `alpha_3='VGB', alpha_2='VG'` @Markus – Todd Apr 20 '20 at 17:39
  • 'Korea' is also not recognized. – Steven Aug 17 '20 at 05:18
3

There is a newer version for this library that supports python3 named geograpy3

pip install geograpy3

It allows you to extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.

Example:

import geograpy
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

You can find more details under this link:

Jendoubi Zaid
  • 119
  • 1
  • 4
  • I've seen this exact text many times "Geograpy allows you to extract place names from a URL or text", but all websites / forums / github project examples show only how to use Geograpy with url and I haven't come across an example with a regular string (neither does it work if we just replace the url in the example code with a regular text) – Mihaela Grigore May 13 '21 at 20:18
  • @MihaelaGrigore `places = geograpy.get_place_context(text="my text from Germany")` – nex Sep 04 '21 at 15:09