python curses addstr y-offset: strange behavior with unicode

Question

I am having trouble with python3 curses and unicode:

#!/usr/bin/env python3
import curses
import locale

locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')

def doStuff(stdscr):
  offset = 3
  stdscr.addstr(0, 0, "わたし")
  stdscr.addstr(0, offset, 'hello', curses.A_BOLD)
  stdscr.getch() # pauses until a key's hit

curses.wrapper(doStuff)

I can display unicode characters just fine, but the y-offset argument to addstr ("offset" in my code) is not acting as expected; my screen displays "わたhello" instead of "わたしhello"

In fact the offset has very strange behavior:

- 0:hello
- 1:わhello
- 2:わhello
- 3:わたhello
- 4:わたhello
- 5:わたしhello
- 6:わたしhello
- 7:わたし hello
- 8:わたし  hello
- 9:わたし   hello

Note that the offset is not in bytes, since the characters are 3-byte unicode characters:

>>>len("わ".encode('utf-8'))
3
>>> len("わ")
1

I'm running python 4.8.3 and curses.version is "b'2.2'".

Does anyone know what's going on or how to debug this? Thanks in advance.

score 0 · Accepted Answer · answered Dec 15 '16 at 02:38

You're printing 3 double-width characters. That is, each of those takes up two cells.

The length of the string in characters (or bytes) is not necessarily the same as the number of cells used for each character.

Python curses is just a thin layer over ncurses.

I'd expect the characters in lines 1,3,5 to be erased by putting a character onto the second cell of those double-width characters (ncurses is supposed to do this...), but that detail could be a bug in the terminal emulator).

score 0 · Answer 2 · answered Dec 15 '16 at 17:04

Based on the response from Thomas, I found the wcwidth package (https://pypi.python.org/pypi/wcwidth) which has a function to return the length of a unicode string in cells.

Here's a full working example:

#!/usr/bin/env python3
import curses
import locale
from wcwidth import wcswidth

locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')

def doStuff(stdscr):
  foo = "わたし"
  offset = wcswidth(foo)
  stdscr.addstr(0, 0, foo)
  stdscr.addstr(0, offset, 'hello', curses.A_BOLD)
  stdscr.getch() # pauses until a key's hit

curses.wrapper(doStuff)

python curses addstr y-offset: strange behavior with unicode

2 Answers2