Extracting the text from the href of an anchor tag using regex and javascript

Question

I'm kinda of a novice when it comes to using Regex and I've kinda stumbled upon some issues. I'm trying to get the text that's inside the href of a link.

This is what I've come up so far

/\w+(?=")/g

And these are the strings I'm testing it on:

<a target="_blank" href="fdsfsd">fdsfs</a>
<a href="mdosfsd"></a>
<link href="f89sdfsd" />

Right now it returns any text that's inside a "", but I don't know how can I select if there's more to that "" in this case if it's a href and if this href is part of <a>

Still want to use RegEx: [`href=(['"])(.*?)\1`](https://regex101.com/r/sU8aN3/1) and get second captured group — Tushar, Jul 20 '16 at 13:53
You don't understand I want to do it only with regex, no **JS functions** or **Jquery selectors**. — user1915308, Jul 20 '16 at 13:54
But this returns **href="text"** I want to get only the text. I've already went this road, that's why in my regex I'm starting with `\w+` — user1915308, Jul 20 '16 at 13:55
Are you running JavaScript on a live page being rendered or are you statically looking at an HTML file? — gcampbell, Jul 20 '16 at 13:56
from what language do you plan on executing said regex from? — castis, Jul 20 '16 at 13:58
@ShudhanshShekhar okay look at this way, maybe the example I'm giving is not good. Let's say you get a string like this "This is too long so comment='dfdsfsd', comment='348958345', comment='fg908fdgkdf'". And I want to get everything that's inside **comment**. I hope it makes more sense now — user1915308, Jul 20 '16 at 14:02

score 1 · Answer 1 · edited May 23 '17 at 11:58

1

You can use Element.getAttribute(). Read about it on the Mozilla Developer Network here

Here's an example:

var attribute = element.getAttribute(attributeName);

Also note: it's bad practice to parse html using Regular Expressions. See here - RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 11:58

Community

1
1

answered Jul 20 '16 at 13:56

Karl Taylor

4,839
3
34
62

You *can* use regex to parse HTML if you're doing something like a find and replace in your editor, or working with a limited set of data. – gcampbell Jul 20 '16 at 13:58
I know how to get content of attribute using JS functions or Jquery selectors that's not the point here – user1915308 Jul 20 '16 at 13:58
Why do you want to get it using regex? – Karl Taylor Jul 20 '16 at 14:00
@user1915308 `/href="(.+)"/` – Karl Taylor Jul 20 '16 at 14:02
@KarlTaylor like I said above this will return href="test" I only want the test inside the href and to check if that href is part of and not let's say – user1915308 Jul 20 '16 at 14:03
The regex I provided matches `test`, check the match information. You can use String.prototype.match() to do stuff with your match. - https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/match – Karl Taylor Jul 20 '16 at 14:08
ok, let's say if "fn" is the function you are using for regex match, then var pre_match = fn(/href="(.+)"/, string); var last_match = fn(/\w+/, pre_match); – Shudhansh Shekhar Jul 20 '16 at 14:10

score 0 · Answer 2 · edited Jul 20 '16 at 15:47

A solution purely using regex, although generally not advisable (as discussed above) is:

var re = /href="[^"]*"/gi,
    extracted = yourText.match(re).map(v => v.slice(6, -1));

Note that this is flawed many ways - for instance, what if the href is defined using single quotes(') instead of double quotes (")? Or, what if there is white-space? Or, a false-positive attribute such as not-an-href="..."

This solution should only be used in simple scenarios, where full robustness against odd edge cases like this is not an issue.

Extracting the text from the href of an anchor tag using regex and javascript

2 Answers2