I'm trying to pass a concatenated list of strings as the regular expression to re.findall
:
re.findall(regex, string)
But I'm getting just a bunch of empty strings in a pair of lists as a result.
re.findall("|".join(locations), 'Zika Outbreak Hits Miami'.lower())
# [('', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''), ('', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '')]
Where locations is a list like this:
['andorra', 'united arab emirates', 'afghanistan', 'antigua and barbuda', 'anguilla', 'albania', 'armenia', 'angola', 'antarctica', 'argentina', 'american samoa', 'austria', 'australia', 'aruba', 'aland islands', 'azerbaijan', 'bosnia and herzegovina', 'barbados', 'bangladesh', 'belgium', 'burkina faso', 'bulgaria', 'bahrain', 'burundi', 'benin', 'saint barthelemy', 'bermuda', 'brunei', 'bolivia', 'bonaire, saint eustatius and saba ', 'brazil', 'bahamas', 'bhutan', 'bouvet island', 'botswana', 'belarus', 'belize', 'canada', 'cocos islands', 'democratic republic of the congo', 'central african republic', 'republic of the congo', 'switzerland', 'ivory coast', 'cook islands', 'chile', 'cameroon', 'china', 'colombia', 'costa rica', 'cuba', ...]
A manual test work like this:
print(re.findall('miami|zika', 'Zika Outbreak Hits Miami'.lower()))
# ['zika', 'miami']
But I don't know what's wrong with concatenating locations to create a big regex. Maybe is that? locations
holds 24588 elements.
I'm currently creating the locations list from what geonamescache offers as cities and countries:
import geonamescache
gc = geonamescache.GeonamesCache()
countries = [country["name"].lower() for country in list(gc.get_countries().values())]
cities = [city["name"].lower() for city in list(gc.get_cities().values())]
locations = countries + cities
The text which I'm working with looks like this:
Zika Outbreak Hits Miami
Could Zika Reach New York City?
First Case of Zika in Miami Beach
Mystery Virus Spreads in Recife, Brazil
Dallas man comes down with case of Zika