0

NET Core Version : 3.1.405 Windows version: Windows 10

The RichTextBox cannot convert non ascii chars from my rtf string in my WPF application.

string rtf ="{\\rtf1\\ansi\\ansicpg1252\\uc1\\htmautsp\\deff2{\\fonttbl{\\f0\\fcharset0 Times New Roman;}{\\f2\\fcharset0 Arial;}}                                                     {\\colortbl\\red0\\green0\\blue0;\\red255\\green255\\blue255;}\\loch\\hich\\dbch\\pard\\plain\\ltrpar\\itap0{\\lang32\\fs30\\f2\\cf0 \\cf0\\qj\\sl15\\slmult0{\\f2 {\\ltrch entête}\\li0\\ri0\\sa0\\sb0\\fi0\\qj\\sl15\\slmult0\\par}}}"

if (rtf.Length > 2)
{
    FlowDocument flowDocument = new FlowDocument
    {
        LineHeight = 1,
        Language = XmlLanguage.GetLanguage(Thread.CurrentThread.CurrentUICulture.Name),                            
    };

    using (MemoryStream stream = new MemoryStream(Encoding.UTF8.GetBytes(rtf)))
    {
      TextRange text = new TextRange(flowDocument.ContentStart, flowDocument.ContentEnd);

      if (stream.Length != 0)
      {
         text.Load(stream, DataFormats.Rtf);
      }

      text.ClearAllProperties();
    }

    return flowDocument;
}

Actual behavior : My RichTextbox display "Entête ". Problem with the conversion of "ê" (non ASCII chars)

Expected behavior: My RichTextbox display "Entête ". Problem with the conversion of "ê"

Jackdaw
  • 7,626
  • 5
  • 15
  • 33
  • I's seems your RTF string does non standard-compliant RTF. If you replace the `{\ltrch entête}` fragment by this one `{\ltrch Ent\'eate}` the non-Unicode character will be displayed correctly. The non-ASCII characters must be escaped! – Jackdaw Jan 15 '21 at 14:00
  • It worked for the .NetFramework 4.6.1 – Mickael Billet Jan 15 '21 at 14:04
  • Just for interest i threw your code in .NET Framework 4.6.1 in it is displaying **entête**. So, you should check the process how your RTF string is creating from the original source. – Jackdaw Jan 15 '21 at 14:36

1 Answers1

0

The RTF string from code above does not standard-compliant RTF. The non-ASCII characters must be escaped! If replace the {\ltrch entête} fragment by this one {\ltrch Ent\'eate} the non-Unicode character will be displayed correctly.

In some cases, it can be assumed that the RTF documents created by different programs, or different versions of programs, may be different: to occurs a version incompatibility.

/// <summary>
/// This method loads the `rtf` string to the `rtb` RichTextBox control. Before loading any non ASCII characters is converting to escaped.
/// </summary>
/// <param name="rtb">The RichTextBox control to which the RTF-string will be loaded</param>
/// <param name="rtf">The RTF-string that will be loaded to the RichTextBox control. The string can contain non-ASCII characters.</param>

public void LoadRtfString(RichTextBox rtb, string rtf)
{
    var flowDocument = new FlowDocument
    {
        LineHeight = 1,
        Language = XmlLanguage.GetLanguage(Thread.CurrentThread.CurrentUICulture.Name),
    };

    using (MemoryStream stream = new MemoryStream(Encoding.UTF8.GetBytes(ConvertNonAsciiToEscaped(rtf))))
    {
        var text = new TextRange(flowDocument.ContentStart, flowDocument.ContentEnd);
        text.ClearAllProperties();
        if (stream.Length != 0)
        {
            text.Load(stream, DataFormats.Rtf);
        }
    }
    rtb.Document = flowDocument;   
}

/// <param name="rtf">An RTF string that can contain non-ASCII characters and should be converted to correct format before loading to the RichTextBox control.</param>
/// <returns>The source RTF string with converted non ASCII to escaped characters.</returns>

public string ConvertNonAsciiToEscaped(string rtf)
{
    var sb = new StringBuilder();
    foreach (var c in rtf)
    {
        if (c <= 0x7f)
            sb.Append(c);
        else
            sb.Append("\\u" + Convert.ToUInt32(c) + "?");
    }
    return sb.ToString();
}

For additional information see:

Jackdaw
  • 7,626
  • 5
  • 15
  • 33
  • It's not possible to replace {\ltrch entête} fragment by this one {\ltrch Ent\'eate} because we have a translation process – Mickael Billet Jan 15 '21 at 15:03
  • Oh, I see. This is a problem. Because of in different posts, when there is the version incompatibility problems, the Microsoft support persons advise to load document into the `RithTextBox` control, correct document if it is necessary and then save it back to a document. Thereafter, when the next time the document will be loaded it will be displayed properly. – Jackdaw Jan 15 '21 at 15:16
  • Or you can try to resolve this problem by replacing characters, like `ê` by `\'e`, in you application, before loading document to the `RithTextBox` control. The following post describes how to use the `StringBuilder` for this purpose: [https://stackoverflow.com/a/1321343/6630084](https://stackoverflow.com/a/1321343/6630084). – Jackdaw Jan 15 '21 at 15:35