4

I am using UNIX function symlink() to link a path that contains Unicode characters. However, when I read the link, it is returning ? instead of Unicode characters.

This is what my method looks like :

if (symlink("symlink.jpg", "/real/path/光芒.jpg") != 0)
    warn("symlink() error\n");
else //symlink creation successful

The symlink is created successfully at this point, however the link looks something like this:

 symlink.jpg -> /real/path/??.jpg

I was expecting the link to look like this:

 symlink.jpg -> /real/path/光芒.jpg

Can anyone tell me why this is happening? Any fix or alternative library/function recommendation would be appreciated.

Additional information:

  1. I am suspecting that library unistd.h or fcntl.h might not have unicode support. Because when I use creat() or open() to create a new file called 光芒.jpg, it actually creates a file called ??.jpg.
  2. My development environment and compiler has unicode support. For example, fprint() can use unicode characters.
blackbeard
  • 385
  • 2
  • 12
  • 4
    As far as I know, it should just work. As far as the `symlink` function call is concerned, those filenames are just sequences of bytes. I suspect the `?`'s you're seeing are coming from the `ls` command, not because those are the actual characters in the link name. So you might try changing your locale settings when you run `ls`. – Steve Summit May 05 '18 at 23:00
  • 1
    Does the symlink actually work (i.e. can you open symlink.jpg in an image editor or web browser?) If not, you might have an encoding-related error. `ls -l symlink.jpg | od -tx1 -Ax` would get you a hex dump (you can alternatively pipe to `xxd` or `hexdump -C` in place of `od -tx1 -Ax` if they're installed on your system). The required byte sequences depend on your locale, but encoding info for the characters can be found [here for 光](http://www.fileformat.info/info/unicode/char/5149/charset_support.htm) and [here for 芒](http://www.fileformat.info/info/unicode/char/8292/charset_support.htm). –  May 06 '18 at 00:41
  • Minor correction to my previous comment: you'd want to open `/real/path/光芒.jpg` since that's the new symlink to the file `symlink.jpg`, though I just realized this wouldn't necessarily validate the proper encoding of the symlink name. I also believed `symlink()` used `path_to_symlink, path_to_link_target` argument order (i.e. `dest, src` like `strcpy`, `strcat`, and many other C functions), but it's the other way around. Either way, my point remains: try to determine the bytes used for those characters. –  May 06 '18 at 01:10

0 Answers0