0

I am looking to build a regular expression that will select a single word out of all text between HTML tags. I am looking for the occurrence of the word anywhere but inside HTML tags. The issue is that the word I am looking to match may occur in the class or id of a tag - I would only like to match it when it is between the tags.

Here is further clarification from my comment: I am looking for a regex to use in a loop that will find a string in another string that contains HTML. The large string will contain something like this:

<div class="a-class"<span class="some-class" data-content="some words containing target">some other text containing target</span>

I want the regex to match the word "target" only between the tags, not within the tag in the data-content attribute. I can use:

/(\btarget)\b/ig

to find every instance of target.

beatsforthemind
  • 879
  • 2
  • 8
  • 17
  • You'll probably want to do it in two steps. There are plenty of examples of getting text inside HTML, and of getting a particular word. – isherwood Jun 12 '15 at 17:12
  • ["You can't parse (X)HTML with regex."](http://stackoverflow.com/a/1732454/17300) – Stephen P Jun 12 '15 at 17:15
  • A general parsing of HTML is impossible, however a focused task is completely plausible. Please give us some examples: some wrong regex matching and a correct matching. – Mehdi Jun 12 '15 at 17:20
  • I am looking for a regex to use in a loop that will find a string in another string that contains HTML. The large string will contain something like this: `
    some other text containing target` I want the regex to match the word "target" only between the tags, not within the tag in the data-content attribute. I can use `/(\btarget)\b/ig` to find every instance of target.
    – beatsforthemind Jun 12 '15 at 20:19

2 Answers2

0

If the word can be present anywhere i.e. even as a class name or id name then here is what you can do,

Take <html> as the parent element and access all the contents within it using innerHTML, now you can find any word as follows,

<html id="main">
    <div>
        <p class="yourword">
        </p>
    </div>
</html>

var str = document.getElementById("main").innerHTML;
var res = str.match(/yourword/gi);
alert(res);

The above string matches the word "yourword" from the entire document.

Here is a demo which selects the string "sub".

Saumil
  • 2,521
  • 5
  • 32
  • 54
0

http://jsfiddle.net/techsin/xt1j2cj8/3/

here is one way to do it.

var cont = $(".cont")
html = cont.html(),
    word = "Lorem";

word = word.replace(/(\s+)/, "(<[^>]+>)*$1(<[^>]+>)*");

var pattern = new RegExp("(" + word + ")", "gi");

html = html.replace(pattern, "<mark>$1</mark>");
html = html.replace(/(<mark>[^<>]*)((<[^>]+>)+)([^<>]*<\/mark>)/, "$1</mark>$2<mark>$4");

$(".cont").html(html);
Muhammad Umer
  • 17,263
  • 19
  • 97
  • 168