I have a script that adds classes to heading tags using Beautiful Soup.
#!/usr/bin/env python
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('test.html'), 'html.parser')
heading_tags = soup.find_all('h1')
for tag in heading_tags:
tag['class'].append('new-class')
with open('test.html', 'w') as html_doc:
html_doc.write(soup.prettify())
This works well, but I would like to preserve the whitespace in the file when writing to it. For example, this Django Template:
<div class="something">
<div class="else">
<h1 class="original-class">Test</h1>
{% if request.foo == 'bar' %}
{{ line.get_something }}
{% else %}
{{ line.get_something_else }}
</div>
</div>
Becomes:
<div class="something">
<div class="else">
<h1 class="original-class new-class">
Test
</h1>
<!-- The formatting is off here: -->
{% if request.foo == 'bar' %}
{{ line.get_something }}
{% else %}
{{ line.get_something_else }}
</div>
</div>
I also tried using soup.encode()
rather than soup.prettify()
. This preserves the Django template code, but flattens the HTML structure.
Is it possible to preserve the original file's whitespace when writing to a file with Beautiful Soup?