I'm trying to scrape a html form using robobrowser with python 3.4. I use the default html parser:
self._browser = RoboBrowser(history=True, parser="html.parser")
It works fine for correct web pages but now I have to parse incorrectly written page. Here is the html fragment:
<form method="post" action="decide.php?act=submit_advance">
<table class="td_advanced">
<tr class="td_advance">
<td colspan="4" class="td_advance"></strong><br></td>
<td colspan="3" class="td_left">Case sensitive:<br><br></td>
<td><input type="checkbox" name="case_sensitive" /><br><br></td>
[...]
</form>
The closing strong
tag is incorrect. This error prevents the parser from read all inputs following this incorrect tag:
form = self._browser.get_form()
print(form)
>>> <RoboForm>
Any suggestions?