Extract URL where text matches a regex - with XPath 1.0

Question

I would like to extract the URL of this type (link text is a number with any number of digits and href is a random text) using an XPath in Scrapy.

I could think of something like

HtmlXPathSelector(response).select('//a[matches(text(),"\d+")]/@href')

However it appears that XPath 2.0 isn't supported and I can't use regex.

The best single line solution I could search was from this question: xpath expression for regex-like matching? - Is there a better way in scrapy to achieve this?

score 3 · Accepted Answer · answered Jun 19 '11 at 15:07

3

.select('//a[. != "" and translate(., "0123456789", "") = ""]/@href')

answered Jun 19 '11 at 15:07

Tomalak

+1 smart answer. @Tomalak how about exactly matching a string while ignoring the case in Xpath 1.0 ? E.g. if my string is "Next". I would use something like (r"^next$", re.I). How would I do that without regex? – user Jun 20 '11 at 06:34
@buffer: That's been asked before, several times. Just search for it. ;) - BTW, another variant would be `... and string(number(.)) != 'NaN'`, but that would accept numerical notations beyond "digits only". – Tomalak Jun 20 '11 at 06:49

1 Answers1