-1

when I prettify a soup, I am trying to get this:

<tag attr="val" />

Instead of this:

<tag attr="val"></tag>

I checked bs4.formatter code and I didn't find an option related to my needs:

def __init__(
            self, language=None, entity_substitution=None,
            void_element_close_prefix='/', cdata_containing_tags=None,
            empty_attributes_are_booleans=False, indent=1,
    ):

How can I achieve this? Thanks

I tried with new_tap options and bs4.formatter options.

1 Answers1

0

I'm not sure why you'd want to do such a thing, since bs4 produces valid html and this would be messing with that, but you could use this function:

def closeVoidElements(html, voidEls=None, parser=None, pFormatter=None):
    if type(voidEls) != list:            
        voidEls = [ 
            'area', 'base', 'br', 'col', 'command', 'embed', 'wbr', 'img', 
            'input', 'keygen', 'link', 'meta', 'param', 'source', 'track', 'hr'
        ] # void elements from https://www.w3.org/TR/2011/WD-html-markup-20110113/syntax.html#syntax-elements 
    
    html = BeautifulSoup(str(html), parser)
    if voidEls: voidEls = set([t.name for t in html.find_all(voidEls)])    
    html = html.prettify()

    for ve in voidEls: 
        html = html.replace(f'<{ve}', f'<{ve}_x').replace(f'{ve}>', f'{ve}_x>')
    html = BeautifulSoup(html, parser).prettify(formatter=pFormatter)
    for ve in voidEls: 
        html = html.replace(f'<{ve}_x', f'<{ve}').replace(f'{ve}_x>', f'{ve}>')
    return html

and call it like closeVoidElements(soup) instead of soup.prettify(). (It's basically changing the tag names of self-closing tags so bs4 doesn't recognize them as such and then parsing and prettifying before changing them back.)

Before, there used to be a selfClosingTags arguments for xml, but it has been discontinued.

Driftr95
  • 4,572
  • 2
  • 9
  • 21