0

I am trying to change the indent of html returning from Beautiful Soup's prettify:

import re

def prettify_with_indent(prettified_str, indent_width=4):
   # continuous whitespace starting from a new line
   r = re.compile(r'^(\s*)', re.MULTILINE)

   # \1 is first capturing group, i.e. all continuous whitespace starting from a newline. 
   # replace whitespace from standard prettify with proper indents
   return r.sub(r'\1' * indent_width, prettified_str)

This works well with tight HTML without any extra newlines. However, as soon as an extra newline appears, it gets multiplied. How can I prevent that?

prettified_str = r"""<p>
 some text
</p>
"""

print(indent(prettified_str))

result:

<p>
    some text
</p>

Example that multiplies newlines:

prettified_str_extra_newlines = r"""<p>
 some text

</p>
"""

print(indent(prettified_str_extra_newlines))

Result:

<p>
    some text




</p>
43Tesseracts
  • 4,617
  • 8
  • 48
  • 94
  • You need to match *horizontal* whitespace, `r'^([^\S\r\n]*)'` or `r'^([^\S\r\n]+)'` – Wiktor Stribiżew Jan 02 '20 at 23:19
  • Times indent width ? So given a string of 20 spaces and a repetition of 6 would result in a indent of 120 spaces. Doesn't sound very nicely proportioned. –  Jan 02 '20 at 23:39
  • Also, I'm assuming this html prettiefier knowns how to work on whitespace inside tags. And that it handles whitespace inside formatted content correctly. –  Jan 02 '20 at 23:43

0 Answers0