-1

Hi everyone I am new to python and trying to use html.parser module of python, I want to scrape this website and fetch the urls, deal name and price with html.parser which is present inside an li tag https://www.mcdelivery.com.pk/pk/browse/menu.html After fetching the url i want to append them in the base URL and fetch the deals with prices from that site too.

import urllib.request
import urllib.parse
import re
from html.parser import HTMLParser

url = 'https://www.mcdelivery.com.pk/pk/browse/menu.html'
values = {'daypartId': '1', 'catId': '1'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')  # data should be bytes
req = urllib.request.Request(url, data)
resp = urllib.request.urlopen(req)
respData = resp.read()
list1 = re.findall(r'<div class="product-cost"(.*?)</div>', str(respData))
for eachp in list1:
    print(eachp)

Was using regex to grab the class but I failed. Now trying to figure out how to do it with html.parser. I know the job gets easier with beautifulsoup and scrapy but I am trying it to do with bare python, so please skip the 3rd party libraries. i really need help. I'm stuck. Html.parser code (updated)

from html.parser import HTMLParser
import urllib.request
import html.parser
# Import HTML from a URL
url = urllib.request.urlopen(
    "https://www.mcdelivery.com.pk/pk/browse/menu.html")
html = url.read().decode()
url.close()


class MyParser(html.parser.HTMLParser):
    def __init__(self, html):
        self.matches = []
        self.match_count = 0
        super().__init__()

    def handle_data(self, data):
        self.matches.append(data)
        self.match_count += 1

    def handle_starttag(self, tag, attrs):
        attrs = dict(attrs)
        if tag == "div":
            if attrs.get("product-cost"):
                self.handle_data()
            else:
                return

parser = MyParser(html)
parser.feed(html)

for item in parser.matches:
    print(item)

  • Python has an [html.parser](https://docs.python.org/3/library/html.parser.html) module in the standard library. Feel free to use it, until you reach the conclusion that you want to use BeautifulSoup. – Amitai Irron Jun 10 '20 at 22:12

1 Answers1

1

Here's a good start that might require specific tuning:

import html.parser

class MyParser(html.parser.HTMLParser):

    def __init__(self, html):
        self.matches = []
        self.match_count = 0
        super().__init__()        

    def handle_data(self, data):
        self.matches.append(data)
        self.match_count += 1

    def handle_starttag(self, tag, attrs):
        attrs = dict(attrs)
        if tag == "div":
            if attrs.get("product-cost"):
                self.handle_data()
        else: return

The usage is along the lines of

request_html = the_request_method(url, ...)

parser = MyParser()
parser.feed(request_html)

for item in parser.matches:
    print(item)
  • `request_html = the_request_method(url, https://www.mcdelivery.com.pk/pk/browse/menu.html)` I have to give the url this way? –  Jun 10 '20 at 22:50
  • `request_html = the_request_method(url, ...)` what should be added here `(url, ...)` after url? –  Jun 10 '20 at 23:02
  • `url = urllib.request.urlopen( "https://www.mcdelivery.com.pk/pk/browse/menu.html") html = url.read().decode() url.close() `Can i fetch the URL this way? –  Jun 10 '20 at 23:44
  • Right, yes. Take `html` and feed it to the parser. –  Jun 10 '20 at 23:46
  • I am updating the code, I am having an error. Please help me. The error is `line 11, in class MyParser(html.parser.HTMLParser): AttributeError: 'str' object has no attribute 'parser'` –  Jun 11 '20 at 00:26