-4

I'm kinda of a novice when it comes to using Regex and I've kinda stumbled upon some issues. I'm trying to get the text that's inside the href of a link.

This is what I've come up so far

/\w+(?=")/g

And these are the strings I'm testing it on:

<a target="_blank" href="fdsfsd">fdsfs</a>
<a href="mdosfsd"></a>
<link href="f89sdfsd" />

Right now it returns any text that's inside a "", but I don't know how can I select if there's more to that "" in this case if it's a href and if this href is part of <a>

user1915308
  • 354
  • 4
  • 12

2 Answers2

1

You can use Element.getAttribute(). Read about it on the Mozilla Developer Network here

Here's an example:

var attribute = element.getAttribute(attributeName);

Also note: it's bad practice to parse html using Regular Expressions. See here - RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Karl Taylor
  • 4,839
  • 3
  • 34
  • 62
0

A solution purely using regex, although generally not advisable (as discussed above) is:

var re = /href="[^"]*"/gi,
    extracted = yourText.match(re).map(v => v.slice(6, -1));

Note that this is flawed many ways - for instance, what if the href is defined using single quotes(') instead of double quotes (")? Or, what if there is white-space? Or, a false-positive attribute such as not-an-href="..."

This solution should only be used in simple scenarios, where full robustness against odd edge cases like this is not an issue.

Tom Lord
  • 27,404
  • 4
  • 50
  • 77
zhibirc
  • 180
  • 14