0

For some reasons I add zero-width joiner to a keyword. I want to exclude adding it when the keyword is in start of new line but I can not remove it again.

I guess two reasons for this problem but I have no idea how to solve the problem in both cases:

1- wrong usage of \n and \r to detect start of line?

2- When I inspect the generated html after adding joiner, I see that the added ‍ is wrapped by double quotation like "‍". Do I need to consider these quotations when trying to remove them?

var tail="\u200D";
var keyword="است";

var htm=$("#test").html();

//Adding joiner to keywords
htm=htm.split(keyword).join(tail+'<span class="red">'+tail+keyword+tail+'</span>'+tail);

//Removing all possible combination of joiner with new lines
htm=htm.split('\r\n'+tail).join('\r\n');
htm=htm.split('\n'+tail).join('\n');
htm=htm.split('\r'+tail).join('\r');
htm=htm.split('\r\n'+'<span class="red">'+tail).join('\r\n'+'<span class="red">');
htm=htm.split('\n'+'<span class="red">'+tail).join('\n'+'<span class="red">');
htm=htm.split('\r'+'<span class="red">'+tail).join('\r'+'<span class="red">');

 $("#test").html(htm);
div{font-size:36pt;}
.red{color:red}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id="test">استکان</div>
Ali Sheikhpour
  • 10,475
  • 5
  • 41
  • 82
  • The characters typically used under Windows to start a new line are the Carriage Return (`\r`), _followed_ by the LineFeed (`\n`) - So, one should look for CRLF sequences, rather than LFCR as you are. TLDR: `\r\n` is correct, `\n\r` on the other hand is LFCR. ;) – enhzflep Mar 21 '19 at 21:56
  • Thank you but replacing all `\n` with `\r\n` made no change. Do you mean something else? @enhzflep – Ali Sheikhpour Mar 21 '19 at 22:15
  • 1
    I'm not sure. I think mean something different. In any case, I've since tested my suggestion and found it to be entirely useless. I can't help but wonder why you are looking for a LF char, then something else, finally looking for the CR. The characters are supposed to be contiguous (a single block with no gaps) and as such, I would expect to have to find them together. Having almost never worked in any language other than english (and never RTL languages), I suspect that I'm vastly under-qualified to be of any help.Sorry! – enhzflep Mar 21 '19 at 23:23
  • Your variable `htm` literally holds the string `"استکان"` There is no line-end character in here. And even if there was, line-end characters are ignored when parsed to DOM. https://jsfiddle.net/xvmo5dLg/ What is it you **really** want to achieve? You want to find the nodes that are at the beginning of the container or you want to find the nodes that are **rendered** after a line break? These are two completely different requests. – Kaiido Mar 22 '19 at 06:34
  • Exactly I want to find the nodes that are rendered after a line break and remove "\u200D" if it is at the beginning of those lines. @Kaiido – Ali Sheikhpour Mar 22 '19 at 06:55
  • ... Then, that won't be easy, and I'm afraid you are not on the correct path at all. First, you have to consider that it will be dynamic: when the user resizes the window, the line break position will probably change, when the font-size differs it may also change, if you add content dynamically, or change the display dynamically (e.g from CSS) then it may change where the line-break is. And you would have to recalculate if your node is after a line-break. – Kaiido Mar 22 '19 at 07:01
  • Sorry I mean real line breaks e.g.
    , vbcrlf, \r\n , or start of `div` and `paragraph`s or start of `TD` etc.. Not those rendered in break graphically. I used "rendered" in wrong meaning.@Kaiido
    – Ali Sheikhpour Mar 22 '19 at 07:07
  • once again `\r\n` is ignored when parsed (well converted to white space U+0020). And if your `
    ` is styled to have its display set to `inline[-...]` then it won't cause a line-break, same for any other elements. So what is the real root problem you are trying to fix?
    – Kaiido Mar 22 '19 at 07:13
  • In Farsi language some characters are connected to each other. When I highlight a keyword within a long word it is disconnected from previous characters so I have to add joiner to reconnect it however the joiner at the begining of sentence is not correct. I can replace all occurances except those are at the beginnings. What is the global rule to detect all **starts**? @Kaiido – Ali Sheikhpour Mar 22 '19 at 07:43
  • I don't know Persian scripts at all, but each word is separated by a space character right? So all you want to find is joiner as first character of the text content, and joiners following a space, and joiners following joiners right? For instance with an input of `"•است••کان"` like you generated (where `\u200D` has been replaced by `•`) you'd want `"است•کان"` as output right? Oh and now think of it, isn't it r-t-l? So you'd actually want `"•است•کان"` no? – Kaiido Mar 22 '19 at 08:11

1 Answers1

0

use .text() instead of .html() for searching:

if($("#test").text().startsWith(tail)) 
    $("#test").html($("#test").html().replace(tail, ''));
Ashkan Mobayen Khiabani
  • 33,575
  • 33
  • 102
  • 171