Replace all instances of anchor tag in a large string

Question

If I have the following:

content = "<a href=\"1\">I</a> was going here and then <a href=\"that\">that</a> happened."

How would I completely remove the tag altogether so the big string no longer has any anchor tags?

I reached only so far:

var href = content.indexOf("href=\"");
var href1 = content.substring(href).indexOf("\"");

What's the desired output, and what tags should be removed (I'm assuming `` tags with only `href` attribute)? — Fabrício Matté, Mar 27 '14 at 03:22
Any instances of need to be removed, but the text inside them should remain as it is. For example, in the string above, `"I was going"` should just be `"I was going"` — DemCodeLines, Mar 27 '14 at 03:26
Answer in jQuery: http://jsfiddle.net/Q3k7L/ (probably not too hard to rewrite in vanilla JS) — Fabrício Matté, Mar 27 '14 at 03:38
While I really appreciate that you created an example for me, I am really looking for pure JS solutions, since I would be able to better understand them. — DemCodeLines, Mar 27 '14 at 03:40
Yeah, I was expecting that you wanted a vanilla solution, I only wrote the jQuery one because it was faster -- and IMO, more understandable/faster to scan (once you get a hang of jQuery) than the nested loops and long DOM API names which jQuery abstracts there — Fabrício Matté, Mar 27 '14 at 03:45
Oh wait, just had an idea. Do you have other tags that need to be kept? If you only the text (without any tag) there is a much easier way around. — Fabrício Matté, Mar 27 '14 at 03:47
I've submitted an alternative solution, feel free to comment on it if you have doubts or anything. — Fabrício Matté, Mar 27 '14 at 04:04

score 15 · Answer 1 · answered Mar 27 '14 at 03:47

This is why God invented regular expressions, which the string.replace method accepts as the string to replace.

var contentSansAnchors = content.replace(/<\/?a[^>]*>/g, "");

If you're new to regex, some explanation:

/.../: Instead of wrapping the search string in quotes, you wrap it in forward slashes to reflect a regular expression.

<...>: These are literal HTML tag braces.

\/?: The tag may or may not (?) start with a forward slash (\/). The forward slash must be escaped using the backslash or the regex will end prematurely here.

a: Literal anchor tag name.

[^>]*: After the a, the tag may contain zero or more (*) characters that are not (^) a closing brace (>). The "anything but a closing brace" expression is wrapped in square braces ([...]) because it represents a single character.

g: This modifies the regular expression to be global, so that all matches are replaced. Otherwise, only the first match would be replaced.

Depending on what strings you are expecting to parse, you may also want to add the i modifier for case insensitivity.

legendJSLC · Answer 2 · 2014-03-27T03:59:42.653

2

You can use Regexp to replace all anchor tags.

var result = subject.replace(/<a[^>]*>|<\/a>/g, "");

edited Mar 27 '14 at 03:59

answered Mar 27 '14 at 03:30

legendJSLC

437
5
7

1

It's [`RegExp`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp), not `Regex`, and the [`replace`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) method is on the `String` prototype, not `RegExp`'s. This will also fail for any attribute values on the `` containing `>` (like `class="foo>bar"`, which is a valid `class` value). – ajp15243 Mar 27 '14 at 03:35
1

Seems like you've written the answer in the wrong language. `=]` – Fabrício Matté Mar 27 '14 at 03:36
1

Was just about to write that `Regex` didn't make sense – DemCodeLines Mar 27 '14 at 03:37
@Fabrício Matté I have wiitten in C#. and now changed to javascript. thx. – legendJSLC Mar 27 '14 at 04:01
This will remove tags like aside, abbr and acronym. try adding a space after the a: var result = subject.replace(/]*>|<\/a>/g, ""); – Ken Oct 02 '21 at 17:29

score 2 · Answer 3 · edited May 23 '17 at 10:26

Strip all tags keeping their text content:

var content = "<a href=\"1\">I</a> was going here and then <a href=\"that\">that</a> happened.";

// parse the HTML string into DOM
var container = document.createElement('div');
container.innerHTML = content;

// retrieve the textContent, or innerText when textContent is not available
var clean = container.textContent || container.innerText;
console.log(clean); //"I was going here and then that happened."

Fiddle

As per OP's comment, the text only contains anchor tags, so this method should work fine.

You may drop the || container.innerText if you don't need IE <= 8 support.

Reference

textContent - Gets or sets the text content of a node and its descendents.
innerText - Sets or retrieves the text between the start and end tags of the object.

Just to answer the question in the title, here is a way to remove only the anchor elements:

var content = "<a href=\"1\">I</a> was going here and then <a href=\"that\">that</a> happened.";

var container = document.createElement('div');
container.innerHTML = content;

var anchors = container.getElementsByTagName('a'),
    anchor;

while (anchor = anchors[0]) {
    var anchorParent = anchor.parentNode;

    while (anchor.firstChild) {
        anchorParent.insertBefore(anchor.firstChild, anchor);
    }
    anchorParent.removeChild(anchor);
}

var clean = container.innerHTML;
console.log(clean); //"I was going here and then that happened."

Fiddle

Reference

Node.insertBefore - Inserts the specified node before a reference element as a child of the current node.
Node.removeChild - Removes a child node from the DOM.
Element.getElementsByTagName - Returns a list of elements with the given tag name. The subtree underneath the specified element is searched, excluding the element itself.

Even though OP is not using jQuery, here is a practically equivalent jQuery version of the above for whom it may concern:

var content = "<a href=\"1\">I</a> was going here and then <a href=\"that\">that</a> happened.";

var clean = $('<div>').append(content).find('a').contents().unwrap().end().end().html();
console.log(clean); //"I was going here and then that happened."

Fiddle

NOTE

All of the solutions in this answer assume that the content is valid HTML -- it won't handle malformed markup, unclosed tags, etc. It also considers that the markup is safe (XSS-sanitized).

If the criteria above is not met, you're better off using a regex solution. Regex should usually be your last resort when the use case involves parsing HTML as it is very easy to break when tested against arbitrary markup (related: virgin-devouring ponies), but your use case seems very simple and a Regex solution may be just what you need.

This answer provides non-regex solutions so that you may use these once (if ever) a regex solution breaks.

Thanks for the reference. Is creating a div element necessary? Is it because once the div is created and its text is set, then the .innerText will only reveal whatever text there is, without any tags? — DemCodeLines, Mar 27 '14 at 04:07
@Ale according to [OP's comment](http://stackoverflow.com/questions/22677593/replace-all-instances-of-anchor-tag-in-a-large-string/22677919?noredirect=1#comment34547694_22677593) there are only anchor tags, but for general use case I'd sanitize the string using [DOM Purify](https://github.com/cure53/DOMPurify) or similar first. — Fabrício Matté, Mar 27 '14 at 04:08
@Ale Also, if I remember correctly, setting `innerHTML` does **not** run script tags. (unlike jQuery's `.html()`) But there may be XSS issues with some HTML attributes, so I'd use DOM Purify in a general use case still. — Fabrício Matté, Mar 27 '14 at 04:08
@DemCodeLines Oh, the div is there so I can set its `innerHTML` thus creating text nodes and anchor elements inside of it. Yes, `.textContent` retrieves all text nodes recursively without any element tags. — Fabrício Matté, Mar 27 '14 at 04:10
@Ale it asks to remove the ``s and I'm removing them. `=]` Sure thing it removes a couple more things, but provided the input only contains `` tags as OP's use case there is no difference in the output. Well, I may post another alternative as an exercise. — Fabrício Matté, Mar 27 '14 at 04:12
@FabrícioMatté didn't notice OPs comment at first, sorry. I will always refresh page before posting. I will always refresh page before posting. — Ale, Mar 27 '14 at 04:14
@Ale No problem. `=]` I've also added a vanilla JS that only removes anchor tags to answer the question in the title. — Fabrício Matté, Mar 27 '14 at 04:28

score 0 · Answer 4 · answered Mar 27 '14 at 03:44

If you could somehow obtain your string in javascript if not dynamic(say you hold it in a var named as "replacedString" in javascript), then in order to fix this you can enclose your entire html content in a div as shown below:-

<div id="stringContent">
  <a href=\"1\">I</a> was going here and then <a href=\"that\">that</a> happened.
</div>

and then your can execute this through jQuery:-

$("#stringContent").empty();
$("#stringContent").html(replacedString);

Replace all instances of anchor tag in a large string

4 Answers4

Reference

Reference

NOTE