SyntaxError:(unicode error) 'unicodeescape' codec' can't decode bytes in position 0-5: truncated \UXXXXXXXX escape

Question

Using Autokey 95.8, Python 3 version in Linux Mint 19.3 and I have a series of keyboard macros which generate Unicode characters. This example works:

# alt+shift+a = á

import sys

char = "\u00E1"
keyboard.send_keys(char)

sys.exit()

But the attempt to print an mdash [—] generates the following error:

SyntaxError:(unicode error) 'unicodeescape' codec' can't decode bytes in position 0-5: truncated \UXXXXXXXX escape

# alt+shift+- = —

import sys

char = "\u2014"
keyboard.send_keys(char)

sys.exit()

Any idea how to overcome this problem in Autokey is greatly appreciated.

Have you tried `\U00002014`? Not sure why it would work, since that's the less-used UTF-32 representation, but worth a try. — HenryTK, Jan 26 '20 at 21:57
My guess: There is no key which produces an mdash. I'm looking at the sources. I'll post an answer if I find something definitive. — mkiever, Jan 26 '20 at 22:30
Thanks to all. \u00002014 does nothing and \U00002014 generates the same error as mentioned above. Also noticed that "\u2014" is UTF-16. — ineuw, Jan 28 '20 at 00:36
`\u2014` is not UTF-16, it is the Unicode code point for `EM DASH`. To be UTF-16, you would need, for example, `'\u2014'.encode('utf-16le')`. — Mark Tolonen, Jan 28 '20 at 11:58
Mark Tolonen, you are correct that it's the code point, but neither variation works. The what would be the UTF-8 code? — ineuw, Jan 29 '20 at 14:45
Try putting another backslash. It works on me, I'm using UTF-8. — OnceACurious, Jan 20 '21 at 15:17

score 1 · Answer 1 · answered Jan 27 '20 at 13:33

1

The code you posted above would not generated the error you ae getting - "truncated \UXXXXXXXX" needs an uppercase \U - and 8 hex-digits - if you try putting in the Python source char = "\U2014", you will get that error message (and probably it you got it when experimenting with the file in this way).

The sequence char = "\u2014" will create an mdash unicode character on the Python side - but that does not mean it is possible to send this as a Keyboard sybo via autokey to Windows. That is the point your program is likely failing (and since there is no programing error, you won't get a Python error message - it is just that it won't work - although Autokey might be nice and print out some apropriate error message in this case).

You'd have to look around on how to type an arbitrary unicode character on your S.O. config (on Linux mint it should be on the docs for "wayland" I guess), and send the character composign sequence to Autokey instead. If there is no such a sequence, then finding a way to copy the desired character to the window environment clipboard, and then send Autokey the "paste" sequence (usually ctrl + v - but depending on the app it could change. Terminal emulators use ctrl + shift + v, for example)

answered Jan 27 '20 at 13:33

jsbueno

99,910
10
151
209

(checking the docs for Autokey looks like it provides the cliboard functionality you will need - however Autokey seens to be broken in its setup, I can't get it to run. – jsbueno Jan 27 '20 at 13:49
Thanks to all. \u00002014 does nothing and \U00002014 generates same error as mentioned above. Also noticed that "\u2014" is UTF-16. Currently, I must emulate the Unicode keyboard keystrokes to generate it, which is not an elegant method. I am also concerned by unexpected issues with ligatures, etc. # alt+shift+- insert mdash surrounded by space
import sys char = "2014" csu = "++u" ret = "" spc = " " keyboard.send_keys(spc + csu + char + ret + spc) sys.exit() – ineuw Jan 28 '20 at 00:29
Please note that "\u2014" is NOT UTF-16 - inside the Python code it is an Unicode character - it does fit, as all other unicode characters into "utf-16" , but that is actually irrelevant to the question. If the suggestion worked, please consider marking the answer as correct. – jsbueno Jan 28 '20 at 13:57
An otherwise working example is the first one. "\u00E1" and a variety of other accented characters are generated by autokey. But there is a range of "code points" which don't work and I can't fathom why. All these characters are in the extended ANSI table between 128 to 256. ANSI code 151 is the mdash, and all of this is part of the Unicode table in numerical order.. – ineuw Jan 29 '20 at 14:51
the difference is that some codes do represent characters that might have keys directly associated to then, and not relying on compositing - like `ç` or `ñ`. It is well possible that any of these characters can be direclty recognized regardless of the current keyboard layout. – jsbueno Jan 29 '20 at 14:55
agreed and understood, but neither is an emdash. I am familiar of compositing characters and use it on Wikisource for foreignlanguage and old English documents. – ineuw Jan 30 '20 at 08:30
This issue was resolved with two different solutions in the gitte /Autokey forum: https://gitter.im/autokey/autokey – ineuw Oct 12 '20 at 23:26

score 0 · Answer 2 · answered Oct 11 '20 at 21:15

When you need to emit non-English US characters in AutoKey, you have two choices. The simplest is to put them into the clipboard with clipboard.fill_clipboard(your characters) and paste them into the window using keyboard.send_keys("<ctrl>+v"). This almost always works.

If you need to define a phrase with multibyte characters in it, select the Paste using Clipboard (Ctrl+V) option. (I'm trying to get that to be the default option in a future release.)

The other choice, that I'm still not quite sure of, is directly sending the Unicode escape sequence to the window, letting it convert that into the actual Unicode character. Something like keyboard.send_keys("\U2014"). Assigning that to a variable first, as in the question, creates the actual Unicode character which that API call can't handle correctly.

The problem being that the underlying code for keyboard.send_keys() wants to send keycodes that actually exist on your keyboard or that it can add to an unused key in your layout. Most of the time that doesn't work for anything multibyte.

SyntaxError:(unicode error) 'unicodeescape' codec' can't decode bytes in position 0-5: truncated \UXXXXXXXX escape

2 Answers2