5

I am new to angular js . I have regex which gets all the anchor tags. My reg ex is

/<a[^>]*>([^<]+)<\/a>/g

And I am using the match function here like ,

var str =  '<a href="mailto:abc.jagadale@gmail.com" style="color:inherit;text-decoration:inherit">abc.jagadale@gmail.com</a>'

So Now I am using the code like

var value = str.match(/<a[^>]*>([^<]+)<\/a>/g);

So, Here I am expecting the output to be abc.jagadale@gmail.com , But I am getting the exact same string as a input string . can any one please help me with this ? Thanks in advance.

bruno.bologna
  • 475
  • 4
  • 14
ganesh kaspate
  • 1
  • 9
  • 41
  • 88
  • 3
    Why don't you use `$("a")` selector and loop through the result list to get it's `href` attrib? – Bharadwaj Jan 02 '18 at 15:06
  • 1
    Smells of [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Do me a favour and add `content:">"` inside your `a` tag's `style` attribute. – ctwheels Jan 02 '18 at 15:25

4 Answers4

1

Why are you trying to reinvent the wheel?

You are trying to parse the HTML string with a regex it will be a very complicated task, just use DOM or jQuery to get the links contents, they are made for this.

  • Put the HTML string as the HTML of a jQuery/DOM element.

  • Then fetch this created DOM element to get all the a elements inside it and return their contents in an array.

This is how should be your code:

var str = '<a href="mailto:abc.jagadale@gmail.com" style="color:inherit;text-decoration:inherit">abc.jagadale@gmail.com</a>';

var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
  results.push($(this).text());
});

Demo:

var str = '<a href="mailto:abc.jagadale@gmail.com" style="color:inherit;text-decoration:inherit">abc.jagadale@gmail.com</a>';

var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
  results.push($(this).text());
});
console.log(results);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
cнŝdk
  • 31,391
  • 7
  • 56
  • 78
  • Hey, Awsome solution. just one thing here, I am getting all the URL's But there are some values which are " " . Can you tell me what exactly is this ? – ganesh kaspate Jan 02 '18 at 16:31
  • @ganeshkaspate Maybe these are empty `a` elements, check your HTML string or please post it here. – cнŝdk Jan 02 '18 at 16:40
0

You need to capture the group inside the anchor tags. The regular expression already matches the inner group ([^<]+) But, when matching there are different ways to extract that inner text.

When using the Match function it will return an array of matched elements, the first one, will match the whole regular expression and the following elements will match the included groups in the regular expression.

Try this:

var reg = /<a[^>]*>([^<]+)<\/a>/g

reg.exec(str)[1]

Also the match function will return an array only if the g flag is not present.

Check https://javascript.info/regexp-groups for further documentation.

bruno.bologna
  • 475
  • 4
  • 14
0

Brief

Don't use regex for this. Regex is a great tool, don't get me wrong, but it's not what you're looking for. Regex cannot properly parse HTML and should only be used to do so if it's a limited, known set of HTML.

Try, for example, adding content:">" to your style attribute. You'll see your pattern now fails or gives you an incorrect result. I don't like to use this quote all the time, but I think it's necessary to use it in this case:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Use builtin functions. jQuery makes this super easy to accomplish. See my Code section for a demonstration. It's way more legible than any regex variant.


Code

DOM from page

The following snippet gets all anchors on the actual page.

$("a").each(function() {
  console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a href="mailto:abc.jagadale@gmail.com">abc.jagadale@gmail.com</a>
<a href="mailto:abc2.jagadale@gmail.com">abc2.jagadale@gmail.com</a>

DOM in string

The following snippet gets all anchors in the string (converted to DOM element)

var s = `<a href="mailto:email3@domain.com">email3@domain.com</a>
<a href="mailto:email4@domain.com">email4@domain.com</a>`

$("<div></div>").html(s).find("a").each(function() {
  console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a href="mailto:email1@domain.com">email1@domain.com</a>
<a href="mailto:email2@domain.com">email2@domain.com</a>
ctwheels
  • 21,901
  • 9
  • 42
  • 77
  • I have more than 1 links in the document. – ganesh kaspate Jan 02 '18 at 15:45
  • @ganeshkaspate it'll work. I edited my answer to include two anchor examples. – ctwheels Jan 02 '18 at 15:47
  • Okay thanks a lot for such a great answer. Here, tried this way just, but what is happening it is taking each and every anchor tag from the html document from the URL as well. So, I don't want that . I have a html file like a document and it has some URL's , I want to have that text. Now that document is html but I get it in a string format from server side. So, that whole document is html but getting as a string. So, I was using the regex for this. – ganesh kaspate Jan 02 '18 at 15:49
  • @ganeshkaspate I added a new snippet showing how to use a normal string, convert it to DOM and then find anchors. – ctwheels Jan 02 '18 at 16:40
0

Given the use case of parsing a string, instead of having an actual DOM to work with, it does seem like regex is the way to go, unless you want to load the HTML into a document fragment and parse that.

One way to get all of your matches is to make use of split:

var htmlstr = "<p><a href='url'>asdf@bsdf.com</a></p>"

var matches = htmlstr.split(/<a.+?>([A-Za-z.@]+)<\/a>/).filter((t, i) => i % 2)

Using a regex with split returns all of the matches along with the text around them, then filtering by index % 2 will pare it down to just the regex matches.

jmcgriz
  • 2,819
  • 1
  • 9
  • 11