Problem
I have a piece of text. It can contain every character from ASCII 32 (space) to ASCII 126 (tilde) and including ASCII 9 (horizontal tab).
The text may contain sentences. Every sentence ends with dot, question mark or exclamation mark, directly followed by space.
The text may contain a basic markdown styling, that is: bold text (**
, also __
), italic text (*
, also _
) and strikethrough (~~
). Markdown may occur inside sentences (e.g. **this** is a sentence.
) or outside them (e.g. **this is a sentence!**
). Markdown may not occur across sentences, that is, there may not be a situation like this: **sentence. sente** nce.
. Markdown may include more than one sentence, that is, there may be a situation like this: **sentence. sentence.**
.
It can also contain two sequences of characters: <!--
and -->
. Everything between these sequences is treated as a comment (like in HTML). Comments can occur at every position in the text, but cannot contains newlines characters (I hope that on Linux it is just ASCII 10).
I want to detect in Javascript all sentences, and for each of them put its length after this sentence in a comment, like this: sentence.<!-- 9 -->
. Mainly, I do not care if their length includes the length of the markdown tags or not, but it would be nice if it does not.
What have I done so far?
So far, with help of this answer, I have prepared the following regex for detecting sentences. It mostly fits my needs – except that it includes comments.
const basicSentence = /(?:^|\n| )(?:[^.!?]|[.!?][^ *_~\n])+[.!?]/gi;
I have also prepared the following regex for detecting comments. It also works as expected, at least in my own tests.
const comment = /<!--.*?-->/gi;
Example
To better see what I want to achieve, let us have an example. Say, I have the following piece of text:
foo0
b<!-- comment -->ar.
foo1 bar?
<!-- comment -->
foo2bar!
(There is also a newline at the end of it, but I do not know how to add an empty line in Stackoverflow markdown.)
And the expected result is:
foo0
b<!-- comment -->ar.<!-- 10 -->
foo1 bar?<!-- 9 -->
<!-- comment -->
foo2bar!<!-- 12 -->
(This time, there is no also newline at the end.)
UPDATE: Sorry, I have corrected the expected result in the example.