0

I am using pywikibot api to fetch Wikipedia Infobox attributes. Few things I want to extract is population denisty, population, elevation etc. For some of the cities e.g(https://en.wikipedia.org/wiki/Beijing), the api is returning "auto" as value for keys like population_density_km2. For few other cities , I am getting the actual density instead of auto. Anyone has any idea on the reasoning behind this and how can I get the actual value?


import pywikibot
def get_page(city: dict) : 
    """
    Returns parsed wikipedia page
    """
    page = pywikibot.Page(en_wiki, re.search(r'wiki/(.*)', city['article']['value']).group(1))
    if page.pageid == 0:
        raise Exception('page do not exist')

    return page

def get_info_box_details(templates: dict):
    """
    Get info box details
    """
    infobox_template = []
    for tmpl, params in templates:
        if 'Infobox' in tmpl:
            infobox_template.append(params)
    population = { k:v for my_dict in infobox_template  for k,v in my_dict.items() if 'population' in k}
    print(population)

wiki_page = get_page(city)

templates = wiki_page.raw_extracted_templates
info_box = get_info_box_details(templates)
Ankit Agarwal
  • 166
  • 1
  • 1
  • 11

2 Answers2

0

From Template:Infobox_settlement's documentation:

To calculate density with respect to the total area automatically, type auto in place of any density value.

So, auto means the infobox template (MediaWiki) will try to calculate the value using area and population values. As for the reasoning, I guess it will reduce redundancy which should make the template maintenance less burdensome for human editors.

You can do the same in you Python program (calculating density by dividing population to area; similar to how the template does it) or you may want to scrape the data directly from the HTML output of the page (which will have it's own challenges).

AXO
  • 8,198
  • 6
  • 62
  • 63
0

Consider using wikidata to get these informations. This is a database with basically informations from infobox but already parsed.

example query that get Population of a country with the sum of the population of its cities

framawiki
  • 242
  • 1
  • 10