How to scrape data from a site if there are some sort of loop of links opening down the page?

Question

Here is the link. When you click on the first link("Accessories and Fluids"), a new table opens on the same page containing other links and clicking on other links,you'll interact with a table. The problem is that the first link have the same xpath as the second links , although they both have different urls, but how can I differentiate between both links so I could extract tables.

This xpath produces only first link portion whenever you move from the previous page to this one:

sp_half=response.xpath('//li[@class="tab pane first"]/a/@href').extract_first()

while this one produces all the links of that page containing other links also.

urls=response.xpath('//li/a/@href').extract()

the second xpath is producing the required urls with lot of extra links. I'm using scrapy to do this. Is there any way to differentiate first urls and second urls which allow me to extract the table.

score 1 · Answer 1 · answered Apr 22 '18 at 12:37

1

You dont need to extract the links at the first xpath. You can collect each tab pane first class as I showed in the first line and then get into that class to extract the links by using a simple for loop.

links = response.xpath('//*[@class="tab pane first"]')
for link in links
   a_link = link.xpath('./a/@href').extract()
   yield {'Category Link': a_link}

answered Apr 22 '18 at 12:37

Land Owner

182
11

Thanks. The first path is doing the same. There is just need to remove first from extract method. The question was how can I get the url of remainings only which pops up after clicking one of the first urls.I hope you understand – Danyal Mughal Apr 22 '18 at 13:05
Thank you so much. I've resolved that issue by measuring length of both xpaths and then taking difference leads to the seconds links. Would you please come on my other question. https://stackoverflow.com/questions/49955430/how-to-turn-display-from-none-to-block-in-scrapy/49955604#49955604 – Danyal Mughal Apr 22 '18 at 17:33

How to scrape data from a site if there are some sort of loop of links opening down the page?

1 Answers1