0

Because a string in C can contain unicode characters of several bytes, where one of the bytes may be a terminating \0 character, I don't think strlen works well when it comes to counting how many bytes there is in such a string.

How to count the length in bytes of such a string properly? I'm not the one allocating the memory for it, but rather I use the property char d_name[256] of the struct dirent in the library dirent.h. Is there any way to see how long the string names are besides just copying the entire 256 bytes? What if I couldn't just have copied the 256 bytes?

Horse SMith
  • 1,003
  • 2
  • 12
  • 25
  • 1
    As I said in [your previous question](http://stackoverflow.com/a/27087022/1009479), it's not a problem to UTF-8, so what encoding are you using? – Yu Hao Nov 23 '14 at 09:19
  • @YuHao I think I made it somewhat clearer here, when I said where I get the string from. – Horse SMith Nov 23 '14 at 09:34
  • 1
    You're misunderstanding Unicode and unicode encodings like UTF-8, UTF-16 and UTF-32. Read [Joel on Software's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html) and [Unicode, UTF-8 and character encodings: What every developer should know](http://www.teknically-speaking.com/2014/02/unicode-utf-8-and-character-encodings_23.html). There's no Unicode strings but strings encoded in some Unicode encodings – phuclv Nov 23 '14 at 09:48

1 Answers1

3

What do you mean by unicode? If it's UTF-8 (dirent.h is a part of POSIX API, so it should be UTF-8), it can't contain '\0' in the middle. strlen will do exactly what you need. If you are using some non-standard version of dirent (maybe some strange port for Windows) with UTF-16, you may use appropriate wide-character string functions.

Mikhail Maltsev
  • 1,632
  • 11
  • 21