0

I'm new to python but am really enjoying the language so far.

I've been creating a bunch of complicated html5 elements and using the html5lib module.

When I go through elements in paragraph I can print them out fine but when I try and use bs4's insert method I get only get every other element output and I don't know why!

My python:

i = 0
    for gallery_elem in gallery_header_next_sibling:
        if ( gallery_elem.name.lower() == 'img' ):
            if ( i == 0 ):
                new_gallery = soup.new_tag( "div" )
                new_gallery[ "class" ] = "gallery"

            new_gallery_elem = soup.new_tag( "figure" )

            if ( gallery_elem.has_attr( "alt" ) ):
                new_gallery_cap = soup.new_tag( "figcaption" )
                new_gallery_cap.string = gallery_elem[ "alt" ]
                new_gallery_elem.insert( 2, new_gallery_cap )

            if ( gallery_elem.has_attr( "title" ) ):
                new_gallery_attribution = soup.new_tag( "dl" )
                new_gallery_attribution_dt = soup.new_tag( "dt" )
                new_gallery_attribution_dt.string = "Image owner:"
                new_gallery_attribution_dd = soup.new_tag( "dd" )
                new_gallery_attribution_dd.string = gallery_elem[ "title" ]
                new_gallery_attribution.insert( 0, new_gallery_attribution_dt )
                new_gallery_attribution.insert( 1, new_gallery_attribution_dd )

        new_gallery_elem.insert( 1, new_gallery_attribution )
        new_gallery_elem.insert( 1, gallery_elem )
        i = i + 1

    new_gallery_elem.insert( 1, gallery_elem )

The HTML

<img alt="Caption One." src="img/orange.jpg" title="Attribution One."/>
<img alt="Caption Two." src="img/red.jpg" title="Attribution Two."/>
<img alt="Caption Three." src="img/urban.jpg" title="Attribution Three."/>
<img alt="Caption Four." src="img/brolly.jpg" title="Attribution Four."/>
<img alt="Caption Five." src="img/tomy.jpg" title="Attribution Five."/>

The output:

<figure><figcaption>Caption One.</figcaption><img alt="Caption One." src="img/orange.jpg" title="Attribution One."/><dl><dt>Image owner:</dt><dd>Attribution One.</dd></dl></figure>
<figure><figcaption>Caption Three.</figcaption><img alt="Caption Three." src="img/urban.jpg" title="Attribution Three."/><dl><dt>Image owner:</dt><dd>Attribution Three.</dd></dl></figure>
<figure><figcaption>Caption Five.</figcaption><img alt="Caption Five." src="img/tomy.jpg" title="Attribution Five."/><dl><dt>Image owner:</dt><dd>Attribution Five.</dd></dl></figure>

If I yank out the following line I get all five elements. Does anyone have any sort of inkling as to what I'm doing wrong?

new_gallery_elem.insert( 1, gallery_elem )
Flowdeeps
  • 21
  • 3

1 Answers1

0

So after some experimenting I found that if I stored the elements I needed in a List and then retrieved them from the List instead of trying to edit the soup live it solved my problems.

Once I had the objects created and stored I could add them back into the parent element I'd previously created and inserted into the soup.

I hope that solves some premature baldness for someone else…

Flowdeeps
  • 21
  • 3