How to scrape a url containing # using python scrapy

Question

I am using python scrapy to scrape a website. The webpage is of the form http://www.cuponation.in/myntra-coupons#voucher-13537 it contains '#' in it. While scraping this webpage as start_url it ignores the part after #.

Is there a way i can scrape the fullurl with # in it using python scrapy

`#` is just going to take you to a particular place on the page. — BrenBarn, Jun 13 '14 at 08:06

score 3 · Accepted Answer · answered Jun 13 '14 at 08:27

3

While scraping it's usual that it ignores the part after #. The symbol usually takes you to a <div> tag on the webpage that has an id equal to 'voucher-13537', that's all it means. So once you scrape the page, you should try to look for something similar to:

<div id="voucher-13537">

and that is what you'd be looking for.

Talking about parsing html files, if you don't already use it, I would suggest you look into BeautifulSoup4 module.

answered Jun 13 '14 at 08:27

Rohit

3,087
3
20
29

+1 for mentioning BeautifulSoup4 which might be more appropriate for the OP – tumultous_rooster Oct 26 '14 at 21:01

How to scrape a url containing # using python scrapy

1 Answers1