I have this text-
text = """<?xml version="1.0"?><mainmodule><module1><heading>Lesson 01: Design Authorization</heading><subheading><item1>Learning Objectives</item1><item2>Choosing an Authorization Approach</item2><item3>Access Management Solution</item3></subheading></module1><module2><heading>Lesson 02: Design a Solution for Logging and Monitoring</heading><subheading><item1>Learning Objectives</item1><item2>Monitoring Tools</item2><item3>Azure Monitor Health and Availability Monitoring</item3><item4>Initiating Automated Response Using Action Groups</item4><item5>Configure and Manage Alerts</item5><item6>Demo Azure Logging and Monitoring</item6><item7>Demo Azure Alerts</item7><item8>Recap</item8></subheading></module2><module3><heading>Lesson 03: Design for High Availability</heading><subheading><item1>Learning Objectives</item1><item2>Architecture Best Practices for Reliability into Categories</item2><item3>Solution for Recovery in Different Regions</item3><item4>Solution for Azure Backup Management</item4><item5>Solution for Data Archiving and Retention</item5></subheading></module3></mainmodule>"""
I would like the output in this format-
output = [{
'heading': 'Lesson 01: Design Authorization',
'subheading': [{'subheading': 'Learning Objectives'},
{'subheading': 'Choosing an Authorization Approach'},
{'subheading': 'Access Management Solution'}]},
{
'heading': 'Lesson 02: Design a Solution for Logging and Monitoring',
'subheading': [{'subheading': 'Learning Objectives'},
{'subheading': 'Monitoring Tools'},
{'subheading': 'Azure Monitor Health and Availability Monitoring'},
{'subheading': 'Initiating Automated Response Using Action Groups'},
{'subheading': 'Configure and Manage Alerts'},
{'subheading': 'Demo Azure Logging and Monitoring'},
{'subheading': 'Demo Azure Alerts'},
{'subheading': 'Recap'}]},
{
'heading': 'Lesson 03: Design for High Availability',
'subheading': [{'subheading': 'Learning Objectives'},
{'subheading': 'Architecture Best Practices for Reliability into Categories'},
{'subheading': 'Solution for Recovery in Different Regions'},
{'subheading': 'Solution for Azure Backup Management'},
{'subheading': 'Solution for Data Archiving and Retention'}]}
]
The text under tag needs to be under "heading" in output. And The text under "subheading" -> "item" needs to be under "subheading" under respective "heading".
I am trying to solve this by creating a list of lists . Till now I have done this, but I am unable to solve.
output = [
{
'heading':'',
'subheading':[
{
'subheading':''
}
]
}
]
import re
heading_list = re.findall('<heading>.+?</heading>',text)
subheading_list = re.findall('<item\d+?>.+?</item\d+?>',text)
no_of_items = 0
count_item = [[]]*len(heading_list)
for num,sub in enumerate(subheading_list):
if '<item1>' in sub and num!=0:
no_of_items+=1
count_item[no_of_items].append(sub)
else:
count_item[no_of_items].append(sub)
I want to append the items in count_item's list of lists, but somehow every items are getting appended in every list. How can I solve this?