0

I'm handling a lot of Unicode file paths in my C++ project. I peform a check in my code , if they are fine enough to fit in Multibyte String , i keep it as a normal string (std::string) variable,where else if the string doesn't fit in Multibyte i use it as a wide char string.

My question is whether i can use the paths totally as wstrings ..? would it affect performance, i have to do some string manipulations,file open, create,rename and delete with the wstring. So rather that checking multibyte or wide char string, i would like to use it directly as wstring which would save me a lot of if/else.

bool IsUnicodeWString(const std::wstring &_WStr)
{
  WCHAR* posUnicodePath = (WCHAR*)_WStr.c_str();
  size_t multiByteLen = wcstombs(NULL, posUnicodePath, 0) + 1;
  int tempLength = 0;
  if (multiByteLen > 0)
  {
    TCHAR* _tmpTChar = new TCHAR[multiByteLen + 1];
    memset(_tmpTChar, '\0', multiByteLen + 1);
    tempLength = wcstombs(_tmpTChar, posUnicodePath, multiByteLen);
    if (tempLength == std::string::npos)
    {
      multiByteLen = 0;
    }
    delete[] _tmpTChar;
  }
  if(multiByteLen == 0 || multiByteLen == std::string::npos) { // Is Unicode file 
    return true;
  }
  else{
    return false;
  }
}

if(IsUnicodeWString) {
        // Use wstring [ Operations - String Manipulations,FilePath used for Open,Read,Write,Create,Delete,Rename,etc]
} else {
        //string  [ Operations - String Manipulations,FilePath used for Open,Read,Write,Create,Delete,Rename,etc]
}

Please share your thoughts ...

Manikandaraj Srinivasan
  • 3,557
  • 5
  • 35
  • 62
  • 3
    You should either make everything use `std::wstring` unconditionally and forget storing MBCS in `std::string` altogether, or else switch to UTF-8 instead of MBCS so there is no possibiity of data loss (MBCS is not loss-less) and then convert between UTF-8 and UTF-16 when calling API functions that require UTF-16. – Remy Lebeau Jun 04 '13 at 19:37
  • 2
    Worry about making it correct before you worry about the speed. Sticking to one format will simplify the code, and that will make it much simpler to make it correct. – Adrian McCarthy Jun 04 '13 at 19:37
  • 1
    My opinion is that your checking and conversion function is going to be far more expensive than just using wide strings everywhere. – Dark Falcon Jun 04 '13 at 19:38
  • 1
    Nearly all of Windows (that uses strings at all) uses wide strings internally, so in most cases using a wide-string version is cheaper than using a narrow-string version. Most narrow-string functions just create an equivalent wide string, then call the wide-string function. – Jerry Coffin Jun 04 '13 at 19:46
  • Identifiers starting with an underscore followed by an uppercase letter are reserved for the impelemtnation. Do not use them, instead use `_wstr` or just get rid of the underscore. The cast is also unnecessary. Use `&_wstr[0]`. – Captain Obvlious Jun 04 '13 at 19:52
  • http://utf8everywhere.org/ –  Jun 04 '13 at 20:27
  • thanks for the comments, but does wstring[wchar_t] have any effect on performance in Windows C++ ..? – Manikandaraj Srinivasan Jun 05 '13 at 12:04

1 Answers1

1

In Windows, Try to use wchar_t as much as posible. Because it is default character representation in Windows, kernel also using wchar_t as default. All of ANSI APIs are the wrapper of UNICODE APIs. If you disassembly ANSI APIs, you will known the truth.

Also, Use ATL::CString instead std::(w)string if possible. Because its used reference counting and the size of the class is equal to pointer size (4 bytes in 32-bits and 8 bytes in 64-bits). That mean you can return ATL::CString directly from the functions without performance penalty.

UltimaWeapon
  • 2,343
  • 18
  • 19