1

I want to follow those links where there is Next in the text like

<div id="pagination"
<a href="#" > 1 </a>
<a href="#" > 2 </a>
<a href="#" > 3 </a>
<a href="#" > Next </a>
</div>

How can i do that i scrapy. it is the last select as well

user1858027
  • 997
  • 1
  • 13
  • 17

1 Answers1

1

Create a class extending BaseSgmlLinkExtractor and provide with process_value callable as shown in docs

Sushant Gupta
  • 8,980
  • 5
  • 43
  • 48
  • can you give me example how to use that with my data. i din't get how to use linkexractor – user1858027 Dec 14 '12 at 06:19
  • Yup, sure. I have an exam in an hour or so. I can tell you after few hours. Till then can you show us what you have tried. Here at SO, we users prefer answering only those question in which the person asking the question has tried out a solution and is facing problem in specifics. You can edit your question and place in the code you have tried. – Sushant Gupta Dec 14 '12 at 06:34
  • just give me sleector to use to find that next link. i am not able to figure out the selector. i have now used beautifulSoup for that but was looking for scrapy way to find it – user1858027 Dec 14 '12 at 07:30
  • For `process_value`, you'll need a regular expression that matches the ` Next ` line. Try creating a multi-line string containing the HTML above and testing out regular expressions in the Python interpreter. This link should be useful: http://www.regular-expressions.info/ – Talvalin Dec 14 '12 at 16:54