0

I was using Lookup api of DBpedia which returned response in an xml format like the following:

<ArrayOfResults>
    <Result>
        <Label>China</Label>
        <URI>http://dbpedia.org/resource/China</URI>
        <Description>China .... administrative regions of Hong Kong and Macau.</Description>
        <Classes>
            <Class>
                <Label>Place</Label>
                <URI>http://dbpedia.org/ontology/Place</URI>
            </Class>
            <Class>
                <Label>Country</Label>
                <URI>http://dbpedia.org/ontology/Country</URI>
            </Class>
        </Classes>
        <Categories>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Member_states_of_the_United_Nations</URI>
            </Category>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Republics</URI>
            </Category>
        </Categories>
        <Refcount>12789</Refcount>
    </Result>
    <Result>
        <Label>Theatre of China</Label>
        <URI>http://dbpedia.org/resource/Theatre_of_China</URI>
        <Description>Theatre of China ... the 20th century.</Description>
        <Classes/>
        <Categories>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Asian_drama</URI>
            </Category>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Chinese_performing_arts</URI>
            </Category>
        </Categories>
        <Refcount>23</Refcount>
    </Result>
</ArrayOfResults>

I have shortened it. The full response can be found in this link

Now, I need to retrieve all the values under the <Label> and <URI> tags.

Here's what I've done so far:

import requests
import xml.etree.ElementTree as ET

response = requests.get('https://lookup.dbpedia.org/api/search?query=China')
response_body = response.content

response_xml = ET.fromstring(response_body)

root = ET.fromstring(response_body)
for child in root:
    print(child.tag)
    for grandchild in child:
        print(f"\t {grandchild.tag}")
        label = grandchild.find('Label')
        uri = grandchild.find('URI')
        print(f"\t required label = {label}")
        print(f"\t required uri = {uri}")

But the value of label and uri is None in every case. How can I solve this issue so that I can get all the values (like China, Theatre of China etc) under <Label> tag of <Result> and the uri of <URI> tag under it?

ganjaam
  • 1,030
  • 3
  • 17
  • 29

2 Answers2

1

You're actually nesting too deep. You need to call find on child (which is a <Result> element):

for child in root:
    label = child.find('Label').text
    uri = child.find('URI').text
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
0

Hi I don't know whether you need to know which urls are connected to what labels but this would be a very simple way to get all URLs out

import requests

url = 'https://lookup.dbpedia.org/api/search?query=China'

soup = BeautifulSoup(requests.get(url).text,'xml').find('Result')

labels = [label.text for label in soup.find_all('Label')]

URI= [uri.text for uri in soup.find_all('URI')]
Lukas Muijs
  • 111
  • 4
  • the **Label** and **URI** are the tags in the response data in **XML** format (the first snippet of code in my post). – ganjaam Mar 11 '21 at 16:53
  • this seems to be an interesting approach. but when I've tried this code I encountered an error that suggested that I should use a parser library. I need to try again after making necessary adjustments to see if it is a better approach for this context. – ganjaam Mar 11 '21 at 16:57