JS RegExp finding word that is not in tag and replace string

Question

I need to write a second RegExp to find variable d inside sentence that is not in tags. So variable in tags should be skipped.

Regex '(?:^|\\b)('+d+')(?=\\b|$)' will find d variable but i need to exclude <span> tag with class="description". New sentence is wrapped in a new tag.

sentence = "This is some word. <span class='description'>word</span> in tag should be skipped"
d = 'word'
re = new RegExp('(?:^|\\b)('+d+')(?=\\b|$)', 'gi')
sentence = sentence.replace(re, "<span>$1</span>")

Result I'm trying to achieve is:

"This is some <span>word</span>. <span class='description'>word</span> in tag should be skipped"

I'm using coffeescript, thanks for the help.

Your string literal is not valid. And `RegExpt` should be without `t` (unless you have a function by that name) — trincot, Oct 18 '17 at 17:53
@trincot Thanks, i'm new to RegExp, can you please help by using this example? — user7754069, Oct 18 '17 at 17:57
Please take some time to read [this masterpiece](https://stackoverflow.com/a/1732454/3136474). — Dinei, Oct 18 '17 at 17:59
@DineiRockenbach is there any other way of finding a word (followed by a comma, dot etc (some rules)) in a string and wrapping it in a new tag? Thanks — user7754069, Oct 18 '17 at 18:05
@user7754069 Is your string an XML/HTML excerpt or a random string with some tags within it? — Dinei, Oct 18 '17 at 18:14

score 0 · Answer 1 · answered Oct 18 '17 at 18:16

0

Try this one: (word)(?![^<>]*<\/)

Full code:

var sentence = "This is some word. <span class='description'>word</span> in tag should be skipped"
var d = 'word'
var re = new RegExp('('+d+')(?![^<>]*<\/)', 'gi')
sentence = sentence.replace(re, "<span>$1</span>")

I based this answer on this snippet: https://regex101.com/library/gN4vI6

answered Oct 18 '17 at 18:16

Dinei

4,494
4
36
60

Thank you Dinei! Can we include my regexp since i need to find a 'word' by some rules: - followed by symbols (dot, comma etc.) or when it's at the beginning of sentence. - right now i want to select 'word.' but not 'semi-word.' so result is: word but not semi-word – user7754069 Oct 18 '17 at 18:33
1

This will also replace "word" in `hello`. – trincot Oct 18 '17 at 19:29
@trincot yeah, i need to 'skip' whole span and include my previous regexp – user7754069 Oct 18 '17 at 19:30

trincot · Answer 2 · 2017-10-27T14:37:29.723

Trying to manipulate HTML with regular expressions is not a good idea: sooner or later you'll bump into some boundary condition where it fails. Maybe some < or > occur inside attribute values, or even inside text nodes, while the searched term may also occur at unexpected places, like in HTML comments, attribute values, or script tags, ... The list of boundary cases is long.

Furthermore, your search term may contain characters that have a special meaning in regular expression syntax, so you should at least escape those.

Here is a solution that interprets the string as HTML, using the DOM capabilities, and only replaces text in text nodes:

function escapeRegExp(str) {
    return str.replace(/[\[\]\/{}()*+?.\\^$|-]/g, "\\$&");
}

function wrapText(sentence, word) {        
    const re = new RegExp("\\b(" + escapeRegExp(word) + ")\\b", "gi"),
        span = document.createElement('span');
    span.innerHTML = sentence;
    Array.from(span.childNodes, function (node) {
        if (node.nodeType !== 3) return;
        node.nodeValue.split(re).forEach(function (part, i) {
            let add;
            if (i%2) {
                add = document.createElement('span');
                add.textContent = part;
                add.className = 'someClass';
            } else {
                add = document.createTextNode(part);
            }
            span.insertBefore(add, node);
        });
        span.removeChild(node);
    });
    return span.innerHTML;
}

const html = 'This is some word. <span class="word">word</span> should stay',
    result = wrapText(html, 'word');

console.log(result);

Recursing into elements

In comments you mentioned that you would now also like to have the replacements happening within some tags, like p.

I'll assume that you want this to happen for all elements, except those that have a certain class, e.g. the class that you use for the wrapping span elements, but you can of course customise the condition to your needs (like only recursing into p, or ...).

The code needs only a few modifications:

function escapeRegExp(str) {
    return str.replace(/[\[\]\/{}()*+?.\\^$|-]/g, "\\$&");
}

function wrapText(sentence, word) {        
    const re = new RegExp("\\b(" + escapeRegExp(word) + ")\\b", "gi"),
        doc = document.createElement('span');
    doc.innerHTML = sentence;
    
    (function recurse(elem) {
        Array.from(elem.childNodes, function (node) {
            // Customise this condition as needed:
            if (node.classList && !node.classList.contains('someClass')) recurse(node);
            if (node.nodeType !== 3) return;
            node.nodeValue.split(re).forEach(function (part, i) {
                let add;
                if (i%2) {
                    add = document.createElement('span');
                    add.textContent = part;
                    add.className = 'someClass';
                } else {
                    add = document.createTextNode(part);
                }
                elem.insertBefore(add, node);
            });
            elem.removeChild(node);
        });
    })(doc);        
    return doc.innerHTML;
}
const html = '<p><b>Some word</b></p>. <span class="someClass">word</span> should stay',
    result = wrapText(html, 'word');
console.log(result);

Thank you i will try it later and give you feedback, although it seems it will work! — user7754069, Oct 18 '17 at 20:09
@user7754069 `new RegExp("\\b(" + escapeRegExp(word) + ")\\b", "gi")` will not work if `word` starts/ends with a non-word char. — Wiktor Stribiżew, Oct 18 '17 at 20:35
Well, it will work, in the sense that it will match if there is a break between non-alphanumericals and alphanumericals, in whichever direction that may be. For instance, if word is "-abc", then it will not be matched in "---abc---", but it will in "abc-abc---". But at the very start/end of a string it would indeed be counter-intuitive. — trincot, Oct 18 '17 at 20:53
@trincot Thanks again for help. I have one question: is there a way to write differently "filter( node => node.nodeType === 3 )" expression since i can not convert it to coffeescript, it says ArrowFunctionExpression not supported. I use js2.coffee — user7754069, Oct 19 '17 at 10:46
@trincot thanks! if i need to add some attributes for new tag in newly created sentence do I add it below 'span.innerHTML = sentence' ? Right now i am trying to add it but returned value for wrapText function is "some word" and i would like that to have attributes — user7754069, Oct 23 '17 at 16:39
I altered the code so that it is easier to add attributes. In the code example you can see that a class attribute is added to the inserted `span` tag. — trincot, Oct 23 '17 at 17:32
@trincot hi trincot once again, thank you so much for help! i was wondering is there a way to handle wrapping inside nodeType==1 , for example:
word is text
would wrap into '
word is text
' ? So we only skip tag with class="someClass", thanks again for help — user7754069, Oct 24 '17 at 10:37
@trincot i'm trying to allow rest of tags, i just need to skip tag — user7754069, Oct 24 '17 at 10:41
I don't know why I did not receive a notification of your latest messages, but I just did receive one for the unacceptance you did. Did I not answer the *original* question to satisfaction? Now to your follow up question. When exactly do you want to look inside other tags, because in your original question you wrote *"So variable in tags should be skipped"*. Although you really should ask such questions as new ones, I have added code for this to my answer with some assumptions. — trincot, Oct 27 '17 at 14:39

JS RegExp finding word that is not in tag and replace string

2 Answers2

Recursing into elements

Linked