How to treat multibyte characters simply as a sequence of bytes?

Question

I would like to use vim with binary files. I run run vim with -b and I have isprint = and display += uhex. I am using the following statusline:

%<%f\ %h%m%r%=%o\ (0x%06O)\ \ %3.b\ <%02B>\ %7P

so I get output containing some useful information like byte offset in the file and the current character in hex etc. But I'm having trouble with random pieced of data interpreted as multibyte characters which prevent me from accessing the inner bytes, combine with surroundings (including vim's decoration) or display as �.

Of course I have tried opening the files with ++enc=latin1. However, my system's encoding is UTF-8, so what vim supposedly does is convert the file from Latin-1 to UTF-8 internally and display that. This has two problems:

The sequence <c3><ac> displays as Ã¬, rather than ì, but the characters count as two bytes each, so it breaks my %o and counts offsets wrong. This is 2 bytes in the file but apparently 4 bytes in vim's buffer.
I don't know why my isprint is ignored. Neither of these characters are between 32 and 126 so they should display in hex.

I found the following workaround: I set encoding to latin1, but termencoding to utf-8. This achieves what I want, but breaks other things like when vim needs to display status messages ("new file", "changed" etc.) in my language, because it wants to use the encoding for them too and they don't fit. I guess I could run vim in LC_ALL=C but it feels I'm resorting to too many dirty tricks already. Is there a better way, i.e., without having to mess around with encoding?

`I don't know why my isprint is ignored` Read `:h 'isprint'` carefully. "The characters from space (ASCII 32) to '~' (ASCII 126) are always displayed directly, even when they are not included in 'isprint' or excluded." — Matt, Oct 11 '19 at 05:48
@Matt I must have expressed myself wrong. I'm fine with printable characters displaying as themselves. I was speaking of the "Ã¬" which is outside the range. If it was "A¬" I would like to see "A". — The Vee, Oct 11 '19 at 07:17
Both `%o` and `isprint` work with Vim buffer (not a file on disk!) which is encoded according to `encoding` setting, i.e. UTF-8. And in UTF-8 ` ` becomes ` <83> `, that is two multibyte chars. Now `:h 'isprint'` says: "Multi-byte characters 256 and above are always included". — Matt, Oct 11 '19 at 08:39
@Matt Thanks, I missed that bit! So the only way is what I ended up with, forcing `encoding` to be 8 bit and reencode for the terminal? — The Vee, Oct 11 '19 at 08:43
Ah, that so much sucks, as `encoding` is global only, so it touches every other window, menu, message etc., while `binary` is local to buffer, so, in fact, there's even no need to start dedicated `vim -b` instance. But I don't see anything better. — Matt, Oct 11 '19 at 09:03

How to treat multibyte characters simply as a sequence of bytes?

0 Answers0