-1

I am trying to write an Xpath to extract URLs used in both @href or @src attributes that are relative (URLs that don't start with http:// or https://).

I have used the below but it's not working:

//*[not(starts-with(@src, 'https:')) and not(starts-with(@href, 'https:'))]

Example node:

<script async="" src="//d.impactradius-event.com/A2421746-f56c-44ad-9e09-bcf28112e9951.js"></script>

I wish to pull src URL. Can someone please help? Thanks.

1 Answers1

0

You can try the following XPath-1.0 expression. It checks both attributes for both strings and then merges the output with the | operator.

//*[not(starts-with(@src, 'https:')) and not(starts-with(@src, 'http:'))]/@src | //*[not(starts-with(@href, 'https:')) and not(starts-with(@href, 'http:'))]/@href

This expression could be simplified with RegEx'es, but XPath-1.0 doesn't support this.

zx485
  • 28,498
  • 28
  • 50
  • 59
  • Won't this return all elements that don't have an `@src` or `@href` attribute? – Michael Kay Jul 07 '23 at 17:32
  • Good question. So I pondered, and tested it to make sure, and it turns out: no, because the first parts of both expressions would indeed return all elements without `@src` and `@href` attributes, but after the predicate the second part selects only the respective `/@src` and `/@href` attributes. So the whole expression can only return `@src` and `@href` attributes at all. – zx485 Jul 07 '23 at 18:45