I am trying to change the indent of html returning from Beautiful Soup's prettify:
import re
def prettify_with_indent(prettified_str, indent_width=4):
# continuous whitespace starting from a new line
r = re.compile(r'^(\s*)', re.MULTILINE)
# \1 is first capturing group, i.e. all continuous whitespace starting from a newline.
# replace whitespace from standard prettify with proper indents
return r.sub(r'\1' * indent_width, prettified_str)
This works well with tight HTML without any extra newlines. However, as soon as an extra newline appears, it gets multiplied. How can I prevent that?
prettified_str = r"""<p>
some text
</p>
"""
print(indent(prettified_str))
result:
<p>
some text
</p>
Example that multiplies newlines:
prettified_str_extra_newlines = r"""<p>
some text
</p>
"""
print(indent(prettified_str_extra_newlines))
Result:
<p>
some text
</p>