I am writing HTMLParser implementation in python which will process a web page downloaded from the internet.
Here is my code:
class Parser(HTMLParser.HTMLParser):
...
parser=Parser()
httpRequest = urllib2.Request("http://www......")
pageContent = urllib2.urlopen(httpRequest)
while (True):
htmlTextPortion = pageContent.read()
parser.feed(htmlTextPortion)
My question is: will the 'read' call block until the whole HTML page is downloaded or it will each time return chunks of page that have been loaded so far?
This is important to me as I need to start processing the web page as soon as possible and not to wait until it's end.
I heard that pycurl library has an option of streaming, but is it for sure I need to switch to pycurl, or i can reach same functionality with urllib2?
Many thanks...