0

I have a very large xml file produced from an application whose part of tree is as below:

XML Tree

There are several items under 'item' from 0 to 7. These names are always named as numbers it can range from 0 to any number. Each of these items will have multiple items all with same structure as per the above tree. Only item 0 to 7 is variable all other structure remains same. under I have a value <bbmds_questiontype>: which can be Multiple Choice or Matching or Essays.

What I need is to have a list the values of <mat_formattedtext>. ie. the output is supposed to be:

                <0>
            <bbmds_questiontype>Multiple Choice</bbmds_questiontype>
        <mat_formattedtext>This is first question </mat_formattedtext></0>
                <1>
            <bbmds_questiontype>Multiple Choice</bbmds_questiontype>
    <mat_formattedtext>This is second question </mat_formattedtext> </1>
                  <2>
            <bbmds_questiontype>Essay</bbmds_questiontype>
    <mat_formattedtext>This is first question </mat_formattedtext> </2>
....

I have tried several solution included xml tree, xmltodict all getting complicated as filters to be applied across different branches of children

import xmltodict
with open("C:/Users/SS/Desktop/moodlexml/00001_questions.dat") as fd:
    doc = xmltodict.parse(fd.read())
shortened=doc['questestinterop']['assessment']['section']['item'] # == u'an attribute'

Any advice will be appreciated to proceed further.

Shiva
  • 67
  • 1
  • 7
  • Have you tried using XPath? If the subtrees are always pairwise, you can go down the n-th subtree using index `NODENAME[n]` at the relevant node (where the branching happens) and skip nodes down until something matches using `//bbmds_questiontype`. – white Aug 04 '22 at 12:25
  • Please don't tell us the file is "very large". That could mean anything from 1Mb to 100Gb. It's also not at all clear that the size is relevant. – Michael Kay Aug 04 '22 at 12:34
  • It's not at all clear where you hit problems, and without knowing that, you're effectively asking people to write the whole code for you. Personally, I wouldn't dream of doing this using Python+DOM, it's going to be far easier in XSLT. – Michael Kay Aug 04 '22 at 12:37
  • Questions are encouraged to provide a [Minimal, Reproducible Example](/help/minimal-reproducible-example) – LMC Aug 04 '22 at 13:31

1 Answers1

0

Have you tried to use bs4 parsing, its simple

Check it out https://linuxhint.com/parse_xml_python_beautifulsoup/

ShockerSam
  • 9
  • 1
  • 4