0

I'm using Selenium for functional testing of a Django application and thought I'd try html5lib as a way of validating the html output. One of the validations is that the page starts with a <!DOCTYPE ...> tag.

The unit test checks with response.content.decode() all worked fine, correctly flagging errors, but I found that Selenium driver.page_source output starts with an html tag. I have double-checked that I'm using the correct template by modifying the title and making sure that the change is reflected in the page_source. There is also a missing newline and indentation between the <html> tag and the <title> tag.

This is what the first few lines looks like in the Firefox browser.

<!DOCTYPE html>
<html>
    <head>
        <title>NetLog</title>
    </head>

Here's the Python code.

self.driver.get(f"{self.live_server_url}/netlog/")
print(self.driver.page_source

And here's the first few lines of the print when run under the Firefox web driver.

<html><head>
        <title>NetLog</title>
    </head>

The page body looks fine, while the missing newline is also present between </body> and </html>. Is this expected behaviour? I suppose I could just stuff the DOCTYPE tag in front of the string as a workaround but would prefer to have it behave as intended.

Chris

Deepstop
  • 3,627
  • 2
  • 8
  • 21
  • I think this is known issue, page resource wouldn't show that. Why do you want that comment anyway ? – cruisepandey Aug 25 '21 at 14:46
  • I wanted it because the `strict=True` validation in html5lib expects it. If that's the way it works I guess I'll just use the hack I mentioned. – Deepstop Aug 25 '21 at 14:54

0 Answers0