1

I run a bs4 program on Python27, it works faultless, I am having a problem once I used Python3. I am using updated version of bs4 for both. The file I am running this on is html and I noticed the error is on a tag. Is there a supporting module I need to update? like lxml?

Code:

from bs4 import BeautifulSoup

data = open(directory +'\\'+ file)
soup = BeautifulSoup(data, 'html.parser')

Here is the error:

...
File "C:\Anaconda3\lib\html\parser.py", line 174, in error 
      raise HTMLParseError(message, self.getpos())
html.parser.HTMLParseError: unknown status keyword 'NKXE' in marked section, 
      at line 318, column 49

Always appreciate the help!

TChi
  • 383
  • 1
  • 6
  • 14
  • Check with your file handler once is it reading or not... – Narendra Mar 05 '18 at 18:42
  • Not sure what you are asking for. Do you mean what is the "data" object? Because I see in python27 it's . – TChi Mar 05 '18 at 18:47
  • Then in python3 its TextIOWrapper: <_io.TextIOWrapper name='C:\\Users\\....txt' mode='r' encoding='cp1252'> – TChi Mar 05 '18 at 18:48

1 Answers1

2

See if installing html5lib

pip install html5lib

And then making the request like this fixes the issue.

from bs4 import BeautifulSoup

data = open(directory +'\\'+ file)
soup = BeautifulSoup(data, 'html5lib')

This has worked for me.

paul41
  • 576
  • 7
  • 17