-3

I am using python scrapy to scrape a website. The webpage is of the form http://www.cuponation.in/myntra-coupons#voucher-13537 it contains '#' in it. While scraping this webpage as start_url it ignores the part after #.

Is there a way i can scrape the fullurl with # in it using python scrapy

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
user2129794
  • 2,388
  • 8
  • 33
  • 51

1 Answers1

3

While scraping it's usual that it ignores the part after #. The symbol usually takes you to a <div> tag on the webpage that has an id equal to 'voucher-13537', that's all it means. So once you scrape the page, you should try to look for something similar to:

<div id="voucher-13537"> 

and that is what you'd be looking for.

Talking about parsing html files, if you don't already use it, I would suggest you look into BeautifulSoup4 module.

Rohit
  • 3,087
  • 3
  • 20
  • 29