16

Basically what I am trying to accomplish is Arabic characters misuse highlighter !

To make it easy for understand I will try to explain a similar functionality but for English.

Imagine a string with wrong capitalization, and it is required to rewrite it correctly, so the user rewrites the string in an input box and submits, the js checks to see if any char wasn't corrected then it displays the whole string with those letter corrected and highlighted in red;

i.e. [test ] becomes [Test ]

To do so, I was checking those chars, and if faulty char was detected it get surrounded with span to be colored in red.

So far so good, now when I try to replicate this for Arabic language the faulty char gets separated from the word making it unreadable.


Demo: jsfiddle

function check1() {
  englishanswer.innerHTML = englishWord.value.replace(/t/, '<span style="color:red">T</span>');
}

function check2() {
  arabicanswer.innerHTML =
    arabicWord.value.replace(/\u0647/, '<span style="color:red">' +
      unescape("%u0629") + '</span>') +
    '<br>' + arabicWord.value.replace(/\u0647/, unescape('%u0629'));
}
fieldset {
  border: 2px groove threedface;
  border-image: initial;
  width: 75%;
}
input {
  padding: 5px;
  margin: 5px;
  font-size: 1.25em;
}
p {
  padding: 5px;
  font-size: 2em;
}
<fieldset>
  <legend>English:</legend>
  <input id='englishWord' value='test' />
  <input type='submit' value='Check' onclick='check1()' />
  <p id='englishanswer'></p>
</fieldset>

<fieldset style="direction:rtl">
  <legend>عربي</legend>
  <input id='arabicWord' value='بطله' />
  <input type='submit' value='Check' onclick='check2()' />
  <p id='arabicanswer'></p>
</fieldset>

Notice when testing the Arabic word, the spanned char [first preview] is separated from the rest of the word, while the non-spanned char [second preview] appears normally.


Edit: Preview for the problem [Chrome UA]

enter image description here

Sнаđошƒаӽ
  • 16,753
  • 12
  • 73
  • 90
Mohammed Ibrahim
  • 550
  • 1
  • 5
  • 15
  • I am surely missing something. The first & second preview are exactly the same other than ة appearing in red. – Jawad Oct 14 '12 at 21:34
  • 2
    Right. Happens in Chrome only. IE, FF, OP and AS it does not happens. – Jawad Oct 14 '12 at 21:38
  • @Jawad, it does happen in Safari 6. – katspaugh Oct 14 '12 at 21:43
  • 3
    I know that Gecko goes to great lengths to make things like this work like the user expects, for instance, colouring in one letter of a digraph does not cause it to separate into individual letters. I can only assume that Webkit isn't as clever. – Neil Oct 14 '12 at 21:45
  • It must be specific to WebKit. I can only assume there's an open bug for it. – nneonneo Oct 14 '12 at 21:50
  • Well fixing
    to
    does not help. Also lang="AR" has no effect. i.e., in CSS only. Somebody could test it for JS.
    – Jawad Oct 14 '12 at 21:54
  • 2
    Found the bug report: https://bugs.webkit.org/show_bug.cgi?id=6148. Looks like there's someone actively working on it, so that's good news. The `‍` trick mentioned in comment #16 doesn't work in my Safari, unfortunately. – nneonneo Oct 14 '12 at 21:59
  • @Jawad You sure it works for OP, seems to have same issue for me ! – Mohammed Ibrahim Oct 14 '12 at 23:56
  • @nneonneo looks like the issue took 7 years to get into the "to be fixed" list, i wounder how long it will take to be fixed :) – Mohammed Ibrahim Oct 14 '12 at 23:58
  • Yeah, well it's less common for someone to actually say "it's on the top of my queue". Usually, if it's a core developer, that means the bug is really getting some attention. Some of these bugs really do take years to fix; 7 years doesn't actually seem terribly uncommon especially if the problem is complex. – nneonneo Oct 15 '12 at 00:19

6 Answers6

3

This is a longstanding bug in WebKit browsers (Chrome, Safari): HTML markup breaks joining behavior. Explicit use of ZWJ (zero-width joiner) used to help (see question Partially colored Arabic word in HTML), but it seems that the bug has become worse.

As a clumsy (but probably the only) workaround, you could use contextual forms for Arabic letters. This can be tested first using just static HTML markup and CSS, e.g.

بطﻠ<span style="color:red">ﺔ</span>

Here I am using, inside the span element, ﺔ U+FE94 ARABIC LETTER TEH MARBUTA FINAL FORM instead of the normal U+0629 ARABIC LETTER TEH MARBUTA and ﻠ U+FEE0 ARABIC LETTER LAM MEDIAL FORM instead of U+0644 ARABIC LETTER LAM.

To implement this in JavaScript, you would need, when inserting markup into a word Arabic letters, change characters before and after the break (caused by markup) to initial, medial, or final representation form according to its position in the word.

Community
  • 1
  • 1
Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
2

i know that this solution i'm giving you is not very elegant but it kinda works so tell me what you think:

<script>
    function check1(){
    englishanswer.innerHTML = englishWord.value.replace(/t/,'<span style="color:red">T</span>');
}
function check2(){
arabicanswer.innerHTML = 
    arabicWord.value.replace(/\u0647/,'<span style="color:red">'+
    unescape("%u0640%u0629")+'</span>')+
    '<br>'+arabicWord.value.replace(/\u0647/,unescape('%u0629'));
}
</script>

<fieldset>
<legend>English:</legend>
<input id='englishWord' value='test'/>
<input type='submit' value='Check' onclick='check1()'/>
<p id='englishanswer'></p>
</fieldset>

<fieldset style="direction:rtl">
<legend>عربي</legend>
<input id='arabicWord' value='بطلـه'/>
<input type='submit' value='Check' onclick='check2()'/>
<p id='arabicanswer'></p>
</fieldset>
Rachid O
  • 13,013
  • 15
  • 66
  • 92
  • It changes the appearance of the word...this may be undesirable. – nneonneo Oct 14 '12 at 22:13
  • i know but i didn't find a good solution for his problem, i just used the letter "u0640" as a link between the two separated letters – Rachid O Oct 14 '12 at 22:21
  • Yeah, I know. I don't think this problem can be easily solved without the browser being fixed. Yours is the best solution so far. I would +1, but I have no more votes today :'( – nneonneo Oct 14 '12 at 22:34
  • Best solution so far, however we will need to count for the char location just like what Mohsen Afshin mentioned here [http://stackoverflow.com/a/12887003/910730] – Mohammed Ibrahim Oct 14 '12 at 23:54
1

You should take care of Beginning , Middle, End and Isolated characters. The complete list is available here

Use ufe94 instead of u0629

arabicWord.value.replace(/\u0647/,'<span style="color:red">'+ unescape("%ufe94")+'</span>')+
Mohsen Afshin
  • 13,273
  • 10
  • 65
  • 90
  • It is the standard character unicode, maybe Safari doesn't interpret it correctly. I've tested that in Chrome and worked. – Mohsen Afshin Oct 15 '12 at 10:23
1

As Jukka K. Korpela indicated, This is mostly a bug in most WebKit-based browsers(chrome, safari, etc).

A simple hack other than the TAMDEED char or getting contextual forms for Arabic letters would be to put the zero-width-joiner (&zwj; or &#x200d;) before/after the letter you want to be treated as single Arabic ligature - two chars making up another one. e.g.

<p>عرب&#x200d;<span style="color: Red;">&#x200d;ي</span></p>  

demo: jsfiddle
see also the webkit bug report.

Community
  • 1
  • 1
Nasser Al-Wohaibi
  • 4,562
  • 2
  • 36
  • 28
  • no need for workarounds, as it is fixed in Chrome 76 with new Layout engine https://developers.google.com/web/updates/2019/06/layoutNG – husayt Jul 25 '19 at 10:22
0

instead of using span, use HTML5 ruby element and add the Arabic-tatweel character "ـ" (U+0640), you know the character that extends letters (shift+j).

so your code becomes:

arabicanswer.innerHTML = 
        (arabicWord.value).replace(/\u0647/,'ـ<ruby style="color:red"> ـ'+
        unescape("%u0629")+'</ruby>')+
        '<br>'+arabicWord.value.replace(/\u0647/,unescape('%u0629'));
    }

and here is an updated fiddle: http://jsfiddle.net/fjz5C/28/

kabaros
  • 5,083
  • 2
  • 22
  • 35
0

I would try adding a ligature/taweel to the character before and after. It won't actually fix the problem, but it will make it difficult to notice, since it will force the lam into medial form and the taa marbuta into final form. If it works, that would be a lot less brittle than actually converting the letters to their medial or final forms.

You seem to have other problems, though. I went to your website and put in a misspelling of hadha , just to see what it would do with it, and it caused the ha to disconnect in both words, which doesn't make sense if the only problem is the formatting tags. (I'm using Firefox on a Mac.)

enter image description here

Good luck!

larapsodia
  • 594
  • 4
  • 15
  • Although using the tatweel character i.e. "ـ" would solve the given case, it won't solve the general issue of separated chars "which is UA related problem" (i just used the given example to illustrate the issue, however in the a application i won't be able to determine whether i should use the tatweel or not unless i determine the position of the char i.e. using regex or something, after determining the char position it won't make a difference whether we used the tatweel or used the appropriate char; at least that what i think). – Mohammed Ibrahim Oct 16 '12 at 19:33
  • And for your second point, the code did replace the char correctly, however in Arabic there is no word that starts with the "Tāʾ marbūṭa" i.e. "ة" so there is no initial form for it, in the other hand the "hāʾ" i.e. "ه" do have initial form which you used in the input field. BTW, you spelled the word "هاذا" correctly :) – Mohammed Ibrahim Oct 16 '12 at 19:38