0

I'm developing an application using wxPython 4.0.4 in Python 3.7.3, and I've run into a problem when trying to color UTF-8 text in a wx.TextCtrl. Basically, it seems that certain characters are counted incorrectly within wxPython despite them being counted correctly in Python.

I initially thought that it was all multi-byte characters were being miss-counted, however, my example code below shows this is not the case. It appears to be a problem specifically in the wx.TextCtrl.SetStyle function.

import wx
import wx.richtext as rt
app = wx.App()

test_str1 = '''There are no multibyte characters '''
test_str2 = '''blah ble blah\n'''
test_str3 = '''“these are multibyte quotes” '''
test_str4 = '''more single byte chars!\n'''
test_str5 = '''this comma’s represented by multiple bytes\n'''
test_str6 = '''why do emojis    seem to break TextCtrl.SetStyle   \n'''
test_str7 = '''more single byte characters\n'''
test_str8 = '''to demonstrate the issue.'''

def main():
    main = TestFrame()
    main.Show()
    app.MainLoop()

class TestFrame(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, None, title="TestFrame")
        sizer = wx.BoxSizer(wx.VERTICAL)
        self.panel = TestPanel(self)
        sizer.Add(self.panel, proportion=1, flag=wx.EXPAND)

class TestPanel(wx.Panel):
    def __init__(self, parent):
        wx.Panel.__init__(self, parent)
        self.text = wx.TextCtrl(self, wx.ID_ANY, style=(wx.TE_MULTILINE|wx.TE_RICH|wx.TE_READONLY))
        self.raw_text = ""
        self.styles = []
        self.AddColorText(test_str1, wx.BLUE)
        self.AddColorText(test_str2, wx.RED)
        self.AddColorText(test_str3, wx.BLUE)
        self.AddColorText(test_str4, wx.RED)
        self.AddColorText(test_str5, wx.BLUE)
        self.AddColorText(test_str6, wx.RED)
        self.AddColorText(test_str7, wx.BLUE)
        self.AddColorText(test_str8, wx.RED)
        self.text.SetValue(self.raw_text)
        for s in self.styles:
            self.text.SetStyle(s[0], s[1], s[2])
        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(self.text, proportion=1, flag=wx.EXPAND)
        self.SetSizer(sizer)

    def AddColorText(self, text, wx_color):
        start = len(self.raw_text)
        self.raw_text += text
        end = len(self.raw_text)
        self.styles.append([start, end, wx.TextAttr(wx_color)])

if __name__ == "__main__":
    main()

enter image description here

Rolf of Saxony
  • 21,661
  • 5
  • 39
  • 60
KatzJP
  • 1
  • 1
  • Add a screenshot to your question to demonstrate your issue, because I see no problem with wxPython 4.1.0 on Linux – Rolf of Saxony Jun 15 '20 at 08:21
  • @RolfofSaxony Updated with a screenshot, additionally I am running this in Windows 10. – KatzJP Jun 16 '20 at 16:02
  • Seems to be a windows issue. All of my emojis are red. The penultimate line is all blue and the last line is all red. Sorry can't help, I'm Linux only. Have you tried changing the font, either in the program or the desktop? – Rolf of Saxony Jun 16 '20 at 16:13

2 Answers2

0

MS Windows uses UTF-16 internally, and before PEP 393, CPython Unicode was also 16-bit on Windows because of that. But with PEP 393 CPython can now represent all Unicode code points more cleanly, so that one Unicode code point always has a string length of 1.

MSWin, on the other hand, cannot. So wxPython must translate strings into UTF-16 before sending them to the operating system. For everything in the basic multilingual plane of Unicode, which is most of what you'll encounter, that works out fine, because one Unicode code point becomes one UTF-16 character (two bytes).

But those new Emoji's are not in the BMP, so they become more than two bytes in UTF-16. And wxPython fails to account for that: If wxPython passes the start and end counters straight through to an underlying Windows function, then they will be off after an Emoji, because the values your are given count Unicode code points, and the values Windows expects are UTF-16 character counts.

You can work around it computing UTF-16 offsets yourself to pass to SetStyle:

utf16start = len(self.raw_text[:start].encode('utf-16'))
utf16end = utf16start + len(self.raw_text[start:end].encode('utf-16'))

Arguably this is a bug in wxPython, and you should report it to the wxPython issue tracker.

Anders Munch
  • 116
  • 3
  • I tried to use this fix in my test case, and it resulted in text coloring being in different locations, but still not correct. I agree that this is a bug and I have reported it to wxPython at https://github.com/wxWidgets/Phoenix/issues/1691 – KatzJP Jun 29 '20 at 19:16
0

It appears that my issue here was using python standard functions to calculate length instead of the wx library functions. The below code resolves my problem.

import wx
import wx.richtext as rt
app = wx.App()

test_str1 = '''There are no multibyte characters '''
test_str2 = '''blah ble blah\n'''
test_str3 = '''“these are multibyte quotes” '''
test_str4 = '''more single byte chars!\n'''
test_str5 = '''this comma’s represented by multiple bytes\n'''
test_str6 = '''why do emojis    seem to break TextCtrl.SetStyle   \n'''
test_str7 = '''more single byte characters\n'''
test_str8 = '''to demonstrate the issue.'''

def main():
    main = TestFrame()
    main.Show()
    app.MainLoop()

class TestFrame(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, None, title="TestFrame")
        sizer = wx.BoxSizer(wx.VERTICAL)
        self.panel = TestPanel(self)
        sizer.Add(self.panel, proportion=1, flag=wx.EXPAND)

class TestPanel(wx.Panel):
    def __init__(self, parent):
        wx.Panel.__init__(self, parent)
        self.text = wx.TextCtrl(self, wx.ID_ANY, style=(wx.TE_MULTILINE|wx.TE_RICH|wx.TE_READONLY))
        self.raw_text = ""
        self.styles = []
        self.AddColorText(test_str1, wx.BLUE)
        self.AddColorText(test_str2, wx.RED)
        self.AddColorText(test_str3, wx.BLUE)
        self.AddColorText(test_str4, wx.RED)
        self.AddColorText(test_str5, wx.BLUE)
        self.AddColorText(test_str6, wx.RED)
        self.AddColorText(test_str7, wx.BLUE)
        self.AddColorText(test_str8, wx.RED)
        for s in self.styles:
            self.text.SetStyle(s[0], s[1], s[2])
        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(self.text, proportion=1, flag=wx.EXPAND)
        self.SetSizer(sizer)

    def AddColorText(self, text, wx_color):
        start = self.text.GetLastPosition()
        self.text.AppendText(text)
        end = self.text.GetLastPosition()
        self.styles.append([start, end, wx.TextAttr(wx_color)])

if __name__ == "__main__":
    main()
KatzJP
  • 1
  • 1