0

I'm trying to get the regular expression for "example.com/page/200/".

Here's what I've done so far:

rules = (Rule (SgmlLinkExtractor(
  allow=("//page/\d+",),
  restrict_xpaths=('xxxxx',)),
  callback="details", follow= True),
)

Could anyone of you give me a solution? Thanks.

David Guyon
  • 2,759
  • 1
  • 28
  • 40
Suresh
  • 123
  • 1
  • 3
  • 8

1 Answers1

0

You have an extra slash, and you need to use a raw string. And, since there is a single expression only, you don't need to pass a tuple to allow:

rules = (Rule(SgmlLinkExtractor(allow=r"/page/\d+", restrict_xpath=('xxxxx',)), 
              callback="details", follow= True),)
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195