0

I have a defaultdict(list).. So, data structure of following format:

1:[1,2,3,4,5]
2:[2,3,4]

I want to generate the following xml

<html>
<page>
<src>1</src>
<links>
   <link>1</link>
   <link>2</link>
    ...
    <link>5</link>
</links>
</page>

<page>
<src>2</src>
<links>
   <link>2</link>
   <link>3</link>
    <link>4</link>
</links>
</page>
<html>

And then write an indented xml to file

frazman
  • 32,081
  • 75
  • 184
  • 269
  • are the xml tags fixed? looks more like a html. – Ammar Apr 07 '14 at 19:13
  • Check out this answer: [http://stackoverflow.com/a/4470210/25097](http://stackoverflow.com/a/4470210/25097). ``lxml.builder.E`` is super-easy to use for this kind of thing. – Christian Aichinger Apr 07 '14 at 19:15
  • @unixer: yepp they are fixed – frazman Apr 07 '14 at 19:15
  • Speaking as someone who does not know how to use the xml libraries of python, your question seems straightforward to implement with two nested for loops, if you are reasonable sure your scope won't grow for this functionality. – Emilio M Bumachar Apr 07 '14 at 19:18

2 Answers2

1

You can use BeautifulSoup:

from bs4 import Tag


d = {1: [1,2,3,4,5], 2: [2,3,4]}

root = Tag(name='html')
for key, values in d.iteritems():
    page = Tag(name='page')
    src = Tag(name='src')
    src.string = str(key)
    page.append(src)

    links = Tag(name='links')
    for value in values:
        link = Tag(name='link')
        link.string = str(value)
        links.append(link)

    page.append(links)
    root.append(page)

print root.prettify()

prints:

<html>
 <page>
  <src>
   1
  </src>
  <links>
   <link>
    1
   </link>
   <link>
    2
   </link>
   <link>
    3
   </link>
   <link>
    4
   </link>
   <link>
    5
   </link>
  </links>
 </page>
 <page>
  <src>
   2
  </src>
  <links>
   <link>
    2
   </link>
   <link>
    3
   </link>
   <link>
    4
   </link>
  </links>
 </page>
</html>
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

You can also define a jinja2 template and render it:

from jinja2 import Template


data = {1:[1,2,3,4,5], 2:[2,3,4]}

html = """<html>
    {% for key, values in data.iteritems() %}
        <page>
        <src>{{ key }}</src>
        <links>
            {% for value in values %}
               <link>{{ value }}</link>
            {% endfor %}
        </links>
        </page>
    {% endfor %}
<html>"""

template = Template(html)
print template.render(data=data)

prints:

<html>
        <page>
        <src>1</src>
        <links>
               <link>1</link>
               <link>2</link>
               <link>3</link>
               <link>4</link>
               <link>5</link>
        </links>
        </page>

        <page>
        <src>2</src>
        <links>
               <link>2</link>
               <link>3</link>
               <link>4</link>
        </links>
        </page>
<html>
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195