I am gathering some articles from Wikipedia (dozens to hundreds, paying attention to the courtesy limit of Wikipedia API).
All articles are brands and in many occasions, the keyword can be very generic and does not refer to only a brand. I am getting in return other propositions like:
Arla may refer to:
- Arla (file system)
- Arla (moth), a genus of moth
- Arkansas Library Association
- Arla, Greece, a village\n\u00c4rla, a village in south-eastern Sweden
- Arla Foods, a large Scandinavian producer ...
I want to spot the one that falls in "brand category" but I can also put other relevant keywords like "food or beverage"
Can I fetch propositions that are containing some keywords using Wikipedia API?
The problem is that when there is ambiguity, the form of response JSON is the same as if one article is found.
Check my script:
import requests
import time
result = {}
for q in spotted_keywords:
url = 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&explaintext&format=json&exintro=&titles='+q+'&redirects=true'
r = requests.get(url)
json_data = r.json()
extract = list(json_data['query']['pages'].values())[0]
if('extract' in extract):
result[q] = extract['extract']
time.sleep(1)
spotted_keywords are like ["mcdonalds", "cocacola" ...]
One response is like:
{
"batchcomplete":"",
"query":{
"normalized":[
{
"from":"arla",
"to":"Arla"
}
],
"pages":{
"360264":{
"pageid":360264,
"ns":0,
"title":"Arla",
"extract":"Arla may refer to:\n\nArla (file system)\nArla (moth), a genus of moth\nArkansas Library Association\nArla, Greece, a village\n\u00c4rla, a village in south-eastern Sweden\nArla Foods, a large Scandinavian producer of dairy products\nArla (Finland), a subsidiary of Arla Foods\nArla Foods UK, a subsidiary of Arla Foods\nARLA, Arm\u00e9e r\u00e9volutionnaire de lib\u00e9ration de l'Azawad (French), Revolutionary Liberation Army of Azawad"
}
}
}
}
Any hints ?