Ignore html tag (specifically a tag) from given string during JS replaceAll operation

Question

I've the case where I'm looping through URL array (ex. [www.stackoverflow.com, www.ex.com]) and matching those URLs one by one with given string during loop and replacing with anchor tag to make it clickable.

I'm able to do it using JS replaceAll method but incase of multiple occurrences of same url in given string it even matches url in tag.

For example, if given string is "Check it out at www.stack.com/abc and bookmark the www.stack.com, www.overflow.com" and given URL array is [www.stack.com/abc, www.stack.com]

During first replace iteration it will be "Check it out at <a href="www.stack.com/abc">www.stack.com/abc</a> and bookmark the www.stack.com"

and then the problem occurs during the second iteration, it'll replace the string even in the tag. I want to ignore the html tag during the replaceAll method. Can someone help me out with this ?

I've tried to ignore tags with the below regex but it doesn't working for content it between anchor tags.

exString.replaceAll(new RegExp(url + "(?![^<>]*>)", "gi"), replaceText);

Thanks for the response @mplungjan. Yes, but even with boundaries it will match as regex expects to match that word. Problem here is to ignore the tag. — Aniket Pawar, Oct 04 '21 at 06:29
why this question is closed as it's completely different requirement over here with the linked question ? — Aniket Pawar, Oct 04 '21 at 07:00
This is likely answered [here](https://stackoverflow.com/questions/37684/how-to-replace-plain-urls-with-links) — mplungjan, Oct 04 '21 at 07:03

score 1 · Accepted Answer · answered Oct 04 '21 at 07:25

Let's split and join then

const div = document.getElementById("text");
let str = div.textContent;
let arr = str.split(/ /)
console.log(arr)

const urls = ["www.stack.com/abc", "www.stack.com"];
arr.forEach((word,i) => {
  const punctuation = word.match(/(\W$)/)
  if (punctuation) word = word.slice(0,-1)
  const idx = urls.indexOf(word);
  if (idx !=-1) arr[i] = arr[i].replace(word,`<a href="${word}">${word}</a>`)
})
console.log(arr)
div.innerHTML = arr.join(" ")

<div id="text">Check it out at www.stack.com/abc and bookmark the www.stack.com, www.overflow.com.</div>

Again, thanks for the response to my problem. I've tried and this is working really well in my cases. — Aniket Pawar, Oct 04 '21 at 07:43

jet_24 · Answer 2 · 2021-10-04T18:07:23.750

Although the solution provided by mplungjan is clever and works well, I wanted to post an alternative.

The algorithm from the accepted answer processes the input string into an array of words and then proceeds to iterate through every word on every URL. Then it needs to see if any word ends with a symbol, and truncate if such. This would be a bit consuming as one can imagine 50 words X 5 possible URLs = 250 combinations and O(n^2) computation. Then to imagine there could be 20 possible URLs and 20 input texts each containing 15+ words. And finally, to mention that algorithm may have issues with case sensitivity.

This solution uses a lot of thought from mplungjan's approach, but instead, it's only going to quickly narrow down what it actually needs to process via RegEx, and then loops again to apply what actually matched. Plus, the RegEx corrects the possible case sensitivity issue.

let str = 'Check it out at www.stack.com/abc and bookmark the www.stack.com, www.overflow.com.';
let urls = ["www.stack.com", "www.stack.com/abc", "www.not-here.com"];
let arReplace = [];

// sort by longest URLs (prevents processing identical root domains on sub-domains)
urls = urls.sort((a, b) =>{
  if(b.length > a.length)
    return 1
  return -1
});

// find URLs and apply replacement tokens
urls.forEach((url) => {
  if(str.match(new RegExp('\\b' + url + '\\b', 'i'))){
    arReplace.push(url);
    str = str.replace(new RegExp('\\b' + url + '\\b', 'gi'), '%ZZ' + (arReplace.length - 1) + 'ZZ%')
  }
});

// replace tokens
arReplace.forEach((url, n)  =>{
    str = str.replace(new RegExp('%ZZ' + n + 'ZZ%', 'g'), '<a href="' + url + '">' + url + '</a>')
});
document.body.innerHTML = str

Fiddle link: https://jsfiddle.net/e05o9cra/

Thanks Joseph for the response. Unfortunately that didn't worked out. See [here](https://regexr.com/66q42) — Aniket Pawar, Oct 04 '21 at 07:45

Ignore html tag (specifically a tag) from given string during JS replaceAll operation

2 Answers2