Parsing XML object in python 3.9

Question

I'm trying to get some data using the NCBI API. I am using requests to make the connection to the API.

What I'm stuck on is how do I convert the XML object that requests returns into something that I can parse?

Here's my code for the function so far:

def getNCBIid(speciesName):
    import requests
    
    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
    
    url = base_url + "esearch.fcgi?db=assembly&term=(%s[All Fields])&usehistory=y&api_key=f1e800ad255b055a691c7cf57a576fe4da08" % speciesName
    
    #xml object
    api_request = requests.get(url)

The code you've shown is irrelevant to your question, which is how to parse the XML. What have you tried to actually parse the XML? Please [edit] your question to provide a [mre] that people can run to replicate the _specific_ problem you're asking about. If you haven't tried anything, please take the [tour], read [ask], the [question checklist](//meta.stackoverflow.com/q/260648/843953), and [How much research effort is expected of Stack Overflow users?](//meta.stackoverflow.com/a/261593/843953) Welcome to Stack Overflow! — Pranav Hosangadi, Jun 08 '21 at 20:00
Sounds like you need an xml parser that you can use to read the raw response from requests. There are many out there, including those listed in the docs at https://docs.python.org/3/library/xml.html. My preference, though, is the add-on `lxml` package. — tdelaney, Jun 08 '21 at 20:05
But since there are so many options, this question is off topic for stackoverflow. Ask your favorite search engine for an xml parser. — tdelaney, Jun 08 '21 at 20:06

score 0 · Accepted Answer · answered Jun 08 '21 at 20:04

0

You would use something like BeautifulSoup for this ('this' being 'convert and parse the xml object').

What you are calling your xml object is still the response object, and you need to extract the content from that object first.

from bs4 import BeautifulSoup

def getNCBIid(speciesName):
    import requests
    
    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
    
    url = base_url + "esearch.fcgi?db=assembly&term=(%s[All Fields])&usehistory=y&api_key=f1e800ad255b055a691c7cf57a576fe4da08" % speciesName
    
    #xml object. <--- this is still just your response object
    api_request = requests.get(url)
     
    # grab the response content 
    xml_content = api_request.content
    
    # parse with beautiful soup        
    soup = BeautifulSoup(xml_content, 'xml')

    # from here you would access desired elements 
    # here are docs: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

answered Jun 08 '21 at 20:04

Iris D

82
1
7

soup = BeautifulSoup(xml_content, 'xml') isn't working. I did install beautifulsoup4 and isn't throwing me and error about importing it – user16168208 Jun 08 '21 at 20:48
My bad, you need this for it: https://lxml.de/elementsoup.html – Iris D Jun 08 '21 at 20:51
This is the error- "bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?" – user16168208 Jun 08 '21 at 20:52
@user16168208 check out the parser I linked above. – Iris D Jun 08 '21 at 21:09

Parsing XML object in python 3.9

1 Answers1