So I'm trying to make a web crawler in python using HTMLParser and urllib3 in python. Currently I have two different import problems the first being
import html.parser
import urllib
urlText = []
#Define HTML Parser
class parseText(HTMLParser.HTMLParser):
def handle_data(self, data):
if data != '\n':
urlText.append(data)
#Create instance of HTML parser
lParser = parseText()
thisurl = "http://www-rohan.sdsu.edu/~gawron/index.html"
#Feed HTML file into parser
lParser.feed(urllib.urlopen(thisurl).read())
lParser.close()
for item in urlText:
print (item)
and with this code it returns an error in the visual studio error box
name 'HTMLParser' is not defined
and the second option is the exact same but with import HTMLParser instead of html.parser
import HTMLParser
import urllib
urlText = []
#Define HTML Parser
class parseText(HTMLParser.HTMLParser):
def handle_data(self, data):
if data != '\n':
urlText.append(data)
#Create instance of HTML parser
lParser = parseText()
thisurl = "http://www-rohan.sdsu.edu/~gawron/index.html"
#Feed HTML file into parser
lParser.feed(urllib.urlopen(thisurl).read())
lParser.close()
for item in urlText:
print (item)
which returns the error
No module named 'markupbase'
I'm losing my mind with the packages. Does anyone know a fix or see a problem. Ps. I'm running this in Visual studio 2016 and am in Python 3.5