0

I'm trying to write a BeautifulSoup object to a file. Note that I append something to the soup object. The thing is div containing HTML/JavaScript from Plotly's to_html() function, which gives me a chart in HTML form. I narrowed down the problem to the following code:

from bs4 import BeautifulSoup

file_writer = open("path/to/file", "w")
html_outline = """<html>
                      <head></head>
                          <body>
                              <p>Hello World!</p>
                              <div></div>
                          </body>
                      </html>"""
soup = BeautifulSoup(html_outline, "html.parser")
soup.div.append({plotly HTML/JavaScript})
file_writer.write(soup)
file_writer.close()

Inside the write function, I've tried various functions for the soup object to convert it to a string, like str(soup), soup.prettify(), and more that I'm forgetting, and those indeed successfully write to the file, but the angled brackets ("<>") from the Plotly HTML I insert become HTML entities (I believe that's what they're called), so a

<div>

becomes:

&lt;div&gt;

inside the file I write to. I will note here that only the angled brackets for the HTML I appended into the soup object turn into HTML entities, the html, head, and body tags are all proper angled brackets.

My question is, how can I convert the soup object directly into a string that has proper angled brackets and no HTML entities?

I guess I can maybe write a function that parses the file for those HTML entities and replaces them with proper angled brackets, but I'm hoping there's a better solution before I do that. I tried searching this problem up multiple times but nothing came up for it.

I asked this question previously but it was marked as a duplicate, but the duplicate question linked didn't help because that was for adding empty tags. I'm appending a whole existing div with JavaScript and other content to my soup object here.

Thanks in advance!

sping
  • 35
  • 9

1 Answers1

1

I found out that I was able to use bs4's .prettify() function, but I had to change the formatter to None. So my line of code that writes the HTML to the file becomes:

file_writer.write(soup.prettify(formatter=None))

This isn't best practice because according to bs4's docs, it said that this may generate invalid HTML/XML. I know the docs say that it should convert HTML entities to Unicode characters by default, so I'm not sure why that didn't work for me. While I'm not in urgent need of a solution anymore, I posted this because I thought that someone may find it useful in the future. Hopefully someone can give a better solution, though!

sping
  • 35
  • 9