0

Consider this typical for Linux function (it returns the current process username):

char* currentUserName(void) {
  struct passwd *p = getpwuid(getuid());
  return (p? p->pw_name : NULL);
}

How to get it in Unicode (let's say wchar_t)? To be honest, I don't know what is the encoding of pw_name even (system? Which one - File System? Always UTF-8?).

Is there a way to get the username as wchar_t string? Maybe some function similar to Windows's GetUserNameW() (where W is for wide-chars) - to do it without to link with iconv library...

Maybe I can use mbstowcs() but which locale will be used? I plan to call this function from systemd service, so I have not idea what LC_CTYPE/LANG is there...

RandomB
  • 3,367
  • 19
  • 30
  • OS identifiers in llunx (filenames, ESSIDs, usernames etc..) are typically zero terminated string of bytes, not limited to a specific encoding. There may be some other limitations of the allowed bytes. (filenames may not allow `'\'`). The identifiers are meant to only be checked for equality by doing simple byte comparisons. For practical use, it is common to define the strings in such a way that they give meaning when interpreting them as utf-8. Converting them to wchar may lose information if they contain non utf-8 bytes. – HAL9000 Nov 08 '22 at 14:22
  • 2
    @HAL9000 [POSIX-compliant user names are limited to ASCII characters](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_437): "To be portable across systems conforming to POSIX.1-2017, the value is composed of characters from the portable filename character set. The character should not be used as the first character of a portable user name." The [portable filename character set](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282) is upper- and lower-case letters, digits, `.`, `_`, and `-`. – Andrew Henle Nov 08 '22 at 15:18
  • @AndrewHenle, That only assumes the system administrator wanted to be portable when creating users. There is no guarantee that usernames on a heterogeneous network have their origin on a POSIX compliant system. Even `useradd` on my GNU/Linux system has the `--badname` option. When handling strings from outside your program, treat them as binary blobs as far as possible. Only concern yourself with their encoding when rendering. – HAL9000 Nov 08 '22 at 16:34
  • `wchar_t` is not *Unicode*, and please do not use it on any Unix and Linux systems: it has a really bad interface (*the cure was worst that the illness*). Unix guys invented UTF-8 to be compatible with `char` and so to all unix tools and API (they invented it for *plan9*, anyway we got UTF-8. So just use `char`. The system knows nothing about encoding (which it is different to Windows). Just now we interpret the strings as UTF-8 (really one should check the locale: warning root locale may not be `C.UTF-8`). – Giacomo Catenazzi Nov 09 '22 at 09:58
  • @GiacomoCatenazzi and what to do if it is not UTF8? Spanish, Japanese, Cyrillic specific encodings, etc? Before I used workflow: setlocal -> mbstowcs -> ... and it worked fine, what can be broken in it, any hints, pls? – RandomB Nov 09 '22 at 10:41
  • Unix (and so Linux) knows only strings. One of the *features* is "no policies on low system". You can set every user name with a different encoding, and for the system it is all fine (it doesn't know about them). *Just* the program which write or read usernames will have problems. So you can use your function if you are always consistent. Just Linux (and mac) choose to go default to UTF-8. People switched to it. Seldom you will find old encodings on modern systems. A very different approach compared to Microsoft (so sysadmin should convert the file, and not programs to implement new API) – Giacomo Catenazzi Nov 09 '22 at 11:00
  • 1
    So, you must known the initial encoding, and keep it consistently, or there is not much you can do (but heuristic guesses). The system has no idea. UTF-8 is just the modern default (since many years), but not guarantee. – Giacomo Catenazzi Nov 09 '22 at 11:02

0 Answers0