I try to write a HTML-parser but during testing I do not want to query the website every time so I saved the website as HTML-file locally.
For reading I use:
urltext = urllib.request.urlopen(urlfile).read().decode("utf-8")
from the website directly I get a correct stringto parse but when I open it from my local pc it seems to have a wrong decoding:
<span id="line845"></span> </span><span><<span class="start-tag">h2</span> <span class="attribute-name">class</span>="<a class="attribute-value">article-title</a>"></span><span>
<span id="line846"></span> </span><span><<span class="start-tag">span</span> <span class="attribute-name">class</span>="<a class="attribute-value">headline-intro</a>"></span><span>Intro:</span><span></<span class="end-tag">span</span>></span><span> </span><span><<span class="start-tag">span</span> <span class="attribute-name">class</span>="<a class="attribute-value">headline</a>"></span><span>Main text</span><span></<span class="end-tag">span</span>></span><span></span><span></<span class="end-tag">h2</span>></span><span>
originally it should look like this:
<h2 class="article-title">
<span class="headline-intro">Intro:</span> <span class="headline">Main Text</span></h2>
Any ideas what I do wrong?
Thanx
Kev