2

I have the following xml file (taken from here:

<BioSampleSet>
   <BioSample submission_date="2011-12-01T13:31:02.367" last_update="2014-11-08T01:40:24.717" publication_date="2012-02-16T10:49:52.970" access="public" id="761094" accession="SAMN00761094">
   <Ids>
   </Ids>
   <Package display_name="Generic">Generic.1.0</Package>
   <Attributes>
      <Attribute attribute_name="Individual">PK314</Attribute>
      <Attribute attribute_name="condition">healthy</Attribute>
      <Attribute attribute_name="BioSampleModel">Generic</Attribute>
   </Attributes>
   <Status status="live" when="2014-11-08T00:27:24"/>
   </BioSample>
</BioSampleSet>

And I need to access the text next to the attribute attribute_nameof the child Attributes. I managed accessing the values of attribute_name.:

from Bio import Entrez,SeqIO
Entrez.email = '#'
import xml.etree.ElementTree as ET

handle = Entrez.efetch(db="biosample", id="SAMN00761094", retmode="xml", rettype="full")
tree = ET.parse(handle)
for attr in root[0].iter('Attribute'):
    name = attr.get('attribute_name')
    print(name)

this prints:

Individual
condition
BioSampleModel

How do I create a dict of the values of attribute_name and the text next to it?

My desired output is

attributes = {'Individual': PK314, 'condition': healthy, 'BioSampleModel': Generic}
Saraha
  • 144
  • 1
  • 12

1 Answers1

2

Based strictly on the xml sample in the question, try something along these lines:

bio = """[your xml sample]"""
doc = ET.fromstring(bio)
attributes = {}
for item in doc.findall('.//Attributes//Attribute'):        
    attributes[item.attrib['attribute_name']]=item.text
attributes

Output:

{'Individual': 'PK314', 'condition': 'healthy', 'BioSampleModel': 'Generic'}
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • So you just access the text just with `item.text`? How do you even call this "field" in xml vocabulary? – Saraha Apr 22 '20 at 12:54
  • 1
    @Saraha Your first (for example) target element `` is a node. It's node() name is `Attribute`; it has an attribute with an attribute name of `attribute_name` which itself has an attribute value of `Individual`. This node has a final text child node. Welcome to the wonderful world of xml terminology! – Jack Fleeting Apr 22 '20 at 14:10