-2

While copying text from word file to text editor I am getting html code like,

<p><br></p>
<p> <br></p>
<p>  <br></p>
<p>    <br></p>

I want to replace above code with empty text like this,

var updated = copyieddata.replace('<p><br></p>', '');
updated = updated.replace('<p> <br></p>', '');
updated = updated.replace('<p>  <br></p>', '');
updated = updated.replace('<p>   <br></p>', '');

How to implement above functionality by using Regex to avoid repetition.

Srinivas
  • 329
  • 3
  • 12
  • 32

2 Answers2

2

pedram's answer is probably the easiest way to achieve what you want.

However, if you want to only remove the <p> <br></p> tags and keep all other tags intact, then you need a regular expression that gets all parts of your string that:

  • Start with <p> and end with </p>
  • Have only <br> or whitespace in between

The regular expression you need would look like this: /<p>(\s|<br>)*<\/p>/g

This expression looks for any substring that starts with <p>, has zero or more occurrences of either whitespace (\s) or the <br> tag, and ends with </p>.

The /g at the end ensures that if there are multiple occurrences of the pattern in the string, then every pattern is matched. Omitting /g would match only the first occurence of the pattern in your string.

So, your code would look something like this:

var pattern = /<p>(\s|<br>)*<\/p>/g;
var updated = copyieddata.replace(pattern, '');
Ravi Mashru
  • 497
  • 3
  • 10
1

The simplest way is convert html to text (it remove all additional html tags, and you get clean text) but also you use this topics to learn how format ms word texts.

Jquery Remove MS word format from text area

Clean Microsoft Word Pasted Text using JavaScript

var text = $('#stack');
text.html(text.text());
console.log(text.html());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="stack">
some text
<p><br></p>
<p> <br></p>
<p>  <br></p>
<p>    <br></p>
some text
</div>

Or you use this to replace all <br> and <p> tags.

$("#stack").html(
  $("#stack").html()
  .replace(/\<br\>/g, "\n")
  .replace(/\<br \/\>/g, "\n")
  .replace(/\<p>/g, "\n")
  .replace(/\<\/p>/g, "\n")
);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="stack">
some text
<p><br></p>
<p> <br></p>
<p>  <br></p>
<p>    <br></p>
some text
</div>

Instead of "\n" you can use nothing like this ""

Pedram
  • 15,766
  • 10
  • 44
  • 73