2

I have following code in HTML string.

<h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3><h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3>

and i want to extract the following tag:

    <a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
    <span>get the content</span>
    </a>
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

I have written following regex :

<h3[^>]+?>(.*)<\/h3>

But it is returning wrong results :

<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3><h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

Please help me to extract the tags.

VLAZ
  • 26,331
  • 9
  • 49
  • 67
MIX DML
  • 123
  • 5
  • 8
    I think [***Tony the pony***](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454) would dislike this !!! – adeneo Apr 25 '16 at 17:32
  • Use `.*?` instead of `.*`. The latter is greedy and will match from the first

    to the last

    , which is what you're seeing.
    – Anton Apr 25 '16 at 17:32
  • @adeneo - thx for the link :) , but I think in this case with a defined subset (H3 tag here) - it's manageable with regExp and without getting _infected_ ... – michaPau Apr 25 '16 at 17:45
  • 1
    @michaPau - the point is, it's just so easy to parse the HTML and do `document.querySelectorAll('a')` – adeneo Apr 25 '16 at 17:48
  • please upvote the answer too if it was useful :D – Nabeel Khan Apr 26 '16 at 05:50

2 Answers2

2

Use this regex:

<h3[^>]+?>([^$]+?)<\/h3>

Example here:

https://regex101.com/r/pQ5nE0/2

Nabeel Khan
  • 3,715
  • 2
  • 24
  • 37
2

You could try:

function getA(str) {
  var regex = /<a\s+[\s\S]+?<\/a>/g;
  while (found = regex.exec(str)) {
    document.write(found[0] + '<br>');
  }
}

var str = '<h3 class="large lheight20 margintop10">\n' +
  '<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">\n' +
  '<span>get the content</span>\n' +
  '</a>\n' +
  '\n' +
  '</h3><h3 class="large lheight20 margintop10">\n' +
  '<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">\n' +
  '<span>get the content</span>\n' +
  '</a>\n' +
  '\n' +
  '</h3>';
getA(str);
Quinn
  • 4,394
  • 2
  • 21
  • 19