0

I use the Parsehub API to scrape the data below in a json format and when I want to print information on a certain country I'm able to get only the first set of data: 'name', 'pop','area','growth','worldPer', and 'rank' but I'm unable to get 'image'.

When I print the entire json file, the data is all there, but when I try to print a countries data with the image value, I get a key error.

Is there a way I can merge the two objects by matching country name?

main.py

class Data:
    def __init__(self, api_key, project_token):
        self.api_key = api_key
        self.project_token = project_token
        self.params = {"api_key":api_key}
        self.data = self.get_data()

    def get_data(self):
        r = requests.get(f'https://www.parsehub.com/api/v2/projects/xxxx/last_ready_run/data', params={"api_key": DATA_API_KEY})
        data = json.loads(r.text)
        print(r.text)
        return data

    def data_by_name(self,country):
        data = self.data['country']
        for content in data:
            if content['name'].lower() == country.lower():
                print(content)
                name = content['name']
                pop = content['pop']
                popRank = content['rank']
                growth = content['growth']
                per = content['worldPer']
                area = content['area']
                image = content['image'] #<----- KeyError: 'image'
        return(name,pop,popRank,growth,per,area)

data = Data(DATA_API_KEY,DATA_PROJECT_TOKEN)
data.data_by_name('china')

country.json

{
 "country": [
  {
   "name": "China",
   "pop": "1,438,862,614",
   "area": "9,706,961 km²",
   "growth": "0.39%",
   "worldPer": "18.47%",
   "rank": "1"
  },
  {
   "name": "China",
   "image": "https://s3.amazonaws.com/images.wpr.com/flag-pages/png250/cn.png"
  }
 ]
}
primitiveProgrammer
  • 151
  • 1
  • 1
  • 12

3 Answers3

2

It would be better to store the data of each country in a dictionary so you don't iterate over all the data each time. You could do like this:

def __init__(self, api_key, project_token):
    ...
    self.countries_data = self.get_countries_data()

...

def get_countries_data(self):
    countries_data = {}
    for content in self.data["country"]:
        name = content["name"]
        countries_data[name] = {**countries_data.get(name, {}), **content}
    return countries_data

def data_by_name(self, country):
    conuntry_data = self.countries_data[country]
    return country_data["name"], country_data["pop"]...
1

Pandas could handle this for you

import pandas as pd

d = {
 "country": [
  {
   "name": "China",
   "pop": "1,438,862,614",
   "area": "9,706,961 km²",
   "growth": "0.39%",
   "worldPer": "18.47%",
   "rank": "1"
  },
  {
   "name": "China",
   "image": "https://s3.amazonaws.com/images.wpr.com/flag-pages/png250/cn.png"
  }
 ]
}

df = pd.DataFrame.from_dict(d['country']).groupby('name').first()

Output

                 pop           area growth worldPer rank                                              image
name
China  1,438,862,614  9,706,961 km²  0.39%   18.47%    1  https://s3.amazonaws.com/images.wpr.com/flag-p...
Chris
  • 15,819
  • 3
  • 24
  • 37
1

There are (at least) two ways to go about this: you could merge all the different entries with the same name, e.g. china, in the data. Or you could search through all the countries each time, and grab all the necessary data from each one that matches your country. Here's an example of the 2nd one, where I modify your data_by_name method. The advantage of this is it works even if you don't know how many times the country might appear:

def data_by_name(self,country):
    data = self.data['country']
    my_dict = {}
    for content in data:
        if content['name'].lower() == country.lower():
            print(content)
            my_dict.update(content) # This updates your dict with the key/value pairs
    return my_dict     # my_dict will have all the different values, including image

If you want only specific fields, you could return those:

    return (
        my_dict['name'],
        my_dict['pop'],
        my_dict['rank'],
        my_dict['growth'],
        my_dict['worldPer'],
        my_dict['area'],
        my_dict['image']
    )

Hope that helps, happy coding!

Sam
  • 1,406
  • 1
  • 8
  • 11