0

I am using the rtf editor to show contents to user. Contents are composed using database values which sometimes consist of Greek letters. Initially they were shown as question marks ? ? ? ? instead of Γ γ Ψ ψ. After research online changed CONTENT.getBytes(); to CONTENT.getBytes("UTF8"); when writing those contents to response object as response.getOutPutStream using bytearraystream from CONTENTS and Display using JavaScript

document.myobj.HttpOpenFileFromStream(contents passed through earlier in response)

Any normal text contents are displaying fine in editor but if there are Greek letters such as Γ γ Ψ ψ then in editor they displayed as Γ γ Ψ ψ. To double check, I printed contents before put in html page and it shows desirable characters Γ γ Ψ ψ but when those gets displayed in UI in RTF editor they become Γ γ Ψ ψ. Can some one help me with this?

Thanks in advance.

KhAn SaAb
  • 5,248
  • 5
  • 31
  • 52
HappyRahul
  • 23
  • 2
  • 8
  • Its seems rtf editor is not supporting greek letters please check for some other editors – KhAn SaAb Jan 07 '16 at 22:07
  • When I get Γ γ Ψ ψ and replace them manually with Γ γ Ψ ψ and save in .rtf file and later if I fetch them I can see correct Γ γ Ψ ψ. It just when they written pragmatically for first time they displayed as wrong Γ γ Ψ ψ. – HappyRahul Jan 07 '16 at 22:11

2 Answers2

0

RTF does not work that way. RTF files can contain only 7-bit ASCII characters (that's part of what the T[ext] part of the name means), but they can represent other characters by one of two text-based encodings. The Wikipedia article on RTF provides details:

The character escapes are of two types: code page escapes and, starting with RTF 1.5, Unicode escapes. [...] For a Unicode escape the control word \u is used, followed by a 16-bit signed decimal integer giving the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, \u1576? would give the Arabic letter bāʼ ب, specifying that older programs which do not have Unicode support should render it as a question mark instead.

Thus one proper RTF encoding of the characters Γ γ Ψ ψ would be:

\u915? \u947? \u936? \u968?

It is, of course, an entirely different question whether any particular RTF software handles such escape sequences correctly.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Note that the escape sequences use *decimal* integers, not hexadecimal integers as in Java Unicode escapes. – John Bollinger Jan 07 '16 at 22:25
  • For > **`\u915? \u947? \u915? \u947?`** its printing **`Γ? γ? Γ? γ?`** .. Its should neglect question mark if it understands UNICODE right? and If i remove question mark at the end of each character, its omitting spaces between charaters?? – HappyRahul Mar 15 '16 at 21:22
  • @HappyRahul, per the specifications, an RTF processor should present *either* the character corresponding to the given code point, *or* the alternative given. If yours is presenting both, then it is buggy. If you need to accommodate buggy software (rather than, say, fixing it) then you'll need to experiment to determine how best to do so. You might try replacing the question marks with spaces instead of deleting them, or even swapping the question marks with the subsequent spaces. – John Bollinger Mar 15 '16 at 23:00
0

Thank you @John. It worked Here is my code.

if (ascii <= 128) { copyBuffer.append(ch); } else { copyBuffer.append("\\u"+ascii+"?"); }

HappyRahul
  • 23
  • 2
  • 8