2

Im writting a simple python code that should detect the my keystrokes but for some reason in detects space after everysingle keystroke.

The code:

import msvcrt

print("press 'escape' to quit...")
text=""
while 1:
    char = msvcrt.getch()
    print(ord(char))

Sample run:

Input: aaaaa

Output:
97
0
97
0
97
0
97
0
97
0
martineau
  • 119,623
  • 25
  • 170
  • 301

1 Answers1

2

It's not detecting space. Space is 32, not 0.

What's happening is that you're using a wide-character terminal, but reading it as bytes, so you're seeing the UTF-16-LE bytes. In UTF-16-LE, an a is two bytes, 97 and 0. If you read those as if they were two ASCII characters instead of one UTF-16-LE character, you'll get a followed by \0.

Notice that what you get back isn't actually 'a\0a\0a\0', but b'a\0a\0a\0'. So you could buffer these up into a bytes or bytearray and use decode('utf-16-le') on it. But that defeats the purpose of reading one character at a time.

The simplest fix is to use getwch instead of getch. This will mostly just do what you want—return a single-character str value like 'a' rather than two separate single-byte bytes values.

There may still be some problems with astral characters (everything above U+FFFF) showing up as two separate surrogates instead of one single character, and "special keys" will still show up as a Unicode U+0000 or U+00E0 followed by a keycode (or, if you have an older Python, possibly as a broken U+E0xx with the keycode embedded in the character). But otherwise, it'll work the way you expected.

Taku
  • 31,927
  • 11
  • 74
  • 85
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • `getch` calls `ReadConsoleInputA`, which returns keyboard input encoded according to the console's current input codepage, which defaults to OEM and cannot be set to UTF-16LE (codepage 1200). Either there is a bug in the OP's version of the C runtime, or there's a bug in the console itself (more frequent in Windows 10 since MS is actively updating the console code), or some misbehaving library or alternate terminal (e.g. ConEmu) has hooked `ReadConsoleInputA` in the current process. – Eryk Sun Apr 28 '18 at 07:29
  • @eryksun ConEmu might be a possibility—but I honestly have no idea how to walk him through diagnosing how he's screwed up his console. And I think that's probably a question for Super User, not SO. Whatever he's done, he's getting UTF-16-LE from his console, and using `getwch` will solve his problem with Python, and he hasn't asked about other problems… But if you think you can diagnose the underlying issue, that would be even better. – abarnert Apr 28 '18 at 08:06
  • I don't think the Stack Exchange Q & A format is best suited for working through bugs that could be due to a broad range of problems. I've been down that road before, and it basically turns into long chats that go on for pages, which also basically doubles as a tutorial on using a debugger and other tools. – Eryk Sun Apr 28 '18 at 09:18
  • Many characters (e.g "∫") will be ignored when pasted in the console. If a pasted character isn't in the keyboard mapping (`VkKeyScan`) or isn't `C3_ALPHA` linguistic (`GetStringType` ) or isn't in a range that's East-Asian full width, then the console pastes it as an OEM Alt+Numpad sequence (or two sequences for non-BMP), i.e. encode as OEM (typically as the default "?"), convert to a decimal string, and insert a sequence of key events. The final key-up for the Alt key has the actual Unicode character value. The CRT `getwch` function does not support this pasted key event sequence. – Eryk Sun Apr 28 '18 at 09:22
  • Note that this Alt+Numpad key-event sequence is only observed with a low-level `ReadConsoleInput` call, as used by `getwch`. For a high-level `ReadConsole` and `ReadFile` call, the console only includes the actual Unicode character (or surrogate pair for non-BMP). – Eryk Sun Apr 28 '18 at 09:32