0

I'm writing a small application that must read files in any Unicode format (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE) into UTF-32 strings, and then manipulate them.

Is there any open source library that offers functionality similar to "string" and "fstream" (or "cstdio" and "cstring") but with full unicode support? Or an easy way to do it with the standard one?

I'd like the solution to be portable.

Sorry for my bad english. Thanks in advance.

Lodovico

lodo
  • 2,314
  • 19
  • 31
  • Possible duplicate question: http://stackoverflow.com/questions/901473/read-unicode-files-c – Tim Apr 21 '14 at 09:43
  • @Tim Well, wstring_t functions are not portable and they (usually) deal only with UTF-16LE. I updated my question. Thanks anyway for the quick reply. – lodo Apr 21 '14 at 09:55
  • 1
    http://site.icu-project.org/ but the question is off topic because you are asking for library recommendations – David Heffernan Apr 21 '14 at 10:30
  • C++ can do that with no extra libraries: http://en.cppreference.com/w/cpp/header/codecvt (although if you need more than just conversions between the UTFs, then you're going to need icu and boost.locale) – Cubbi Apr 21 '14 at 11:25
  • @Cubbi and David Hefferman Thanks for your comments. What I need exactly are hashes, substrings and length in term of unicode codepoints. That's why IMB ICU is not exactly what I'm looking for. – lodo Apr 21 '14 at 11:34
  • see utf8everywhere.org – Pavel Radzivilovsky May 04 '14 at 17:46

1 Answers1

0

You will find this class useful CTextFileDocument - http://www.codeproject.com/Articles/7958/CTextFileDocument

I have been using this for many years. The "About the code" section says that it has been made platform independent.

It is mainly for files. But you will be able to just copy related code to handle strings also.

Gautam Jain
  • 6,789
  • 10
  • 48
  • 67