1

How to open a file which path or file name contains unicode characters and read or write it's content without using any special API ?. How to do it using only std libraries if it's possible or using only windows API ?. I did try std::wifstream to open a file as in the code sample below, but it doesn't compile. Looks like it doesn't take 'const wchar_t*' argument but 'const char*'. I'm using TDM-GCC 4.7.1 compiler which is included with Dev-C++ IDE.

#ifndef UNICODE
#define UNICODE
#endif
...
#include <clocale>
#include <windows.h>
#include <fstream>
...
int main(int argc, char **argv)
{
    setlocale(LC_ALL, "Polish_Poland.852") ;
    ...
    fileCompare(first, second) ;
    ...
}
...
bool fileCompare(wstring first, wstring second)  // This function doesn't compile !
{
    using namespace std ;
    wifstream fin0(first.c_str(), ios::binary) ;
    wifstream fin1(second.c_str(), ios::binary) ;
    ...
}

Some complete example:

#ifndef UNICODE
#define UNICODE
#endif

#include <clocale>
#include <conio.h>
#include <windows.h>
#include <fstream>
#include <string>
#include <iostream>

using namespace std ;

bool fileCompare(wstring first, wstring second) ;

int main(int argc, char **argv)
{
    setlocale(LC_ALL, "Polish_Poland.852") ;

    wstring first, second ;
    first = L"C:\\A.dat" ;
    second = L"C:\\E.dat" ;

    fileCompare(first, second) ;

    getch() ;
    return 0 ;
}

bool fileCompare(wstring first, wstring second)  // This function doesn't compile !
{
    wifstream fin0(first.c_str(), ios::binary) ;
    wifstream fin1(second.c_str(), ios::binary) ;

}

Also when I replace L"C:\A.dat" and L"C:\E.dat" to strings containing Polish characters it outputs an error about illegal byte sequence.

user1978386
  • 237
  • 8
  • 19

1 Answers1

1

The wifstream doesn't deal with the issue of filename encoding. As far as I know the filenames of wifstream and ifstream are all char based not wchar_t based. You will have to provide the filename in the char encoding used by your OS e.g. latin1, utf8 etc..

The wifstream however enables you to read a stream of wchar_t. You may tell the stream what input you expect by imbuing The stream:

e.g.

 // We expect the file to be UTF8 encoded
 std::locale locale("en_US.utf8");
 fin0.imbue(locale);

EDIT: If you need to transform your file names (or any string) from wchar_t into the appropriate char encoding you may dive deeper into the theme of codecvt facets of locales.

// Method translates wchar_t => pl_PL.iso88592" encoding
std::string to_string(const std::wstring & wstr)  
{ 

    typedef std::codecvt< wchar_t, char, std::mbstate_t > ccvt_t;  

    std::locale loc("pl_PL.iso88592");    

    const ccvt_t & facet = std::use_facet<ccvt_t>( loc );  

    std::string s;  
    {  
        std::mbstate_t st=mbstate_t();  

        const wchar_t *wac = wstr.c_str();  
        const wchar_t *wou = wac + wstr.length();  
        const wchar_t *wnx = wac;   

        ccvt_t::result r = ccvt_t::ok;  

        while(wou!=wnx && (r==ccvt_t::ok || r==ccvt_t::partial))  
        {  
            static const int l = 100;  
            static char cou[l];  
            char *cnx=NULL;  
            r = facet.out(st,wac,wou,wnx,cou,cou+l,cnx);  
            s+=std::string(cou,cnx-cou);  
            wac=wnx;  
        }  
    }  

    return s;  
} 

What kind of std::locale is supported and how you may specify it may be OS dependent.

Oncaphillis
  • 1,888
  • 13
  • 15
  • And what about windows API ? – user1978386 Nov 02 '14 at 14:50
  • @user1978386 Try std::locale locale("Polish_Poland.852"); – Oncaphillis Nov 02 '14 at 14:54
  • OK. I think I'm starting to understand. I need to call wifstream with char* afetr calling 'imbue' with proper locale. But what if I have all paths and names contained in wstring objects ? – user1978386 Nov 02 '14 at 14:59
  • @user1978386 Added Code for wchat_t => char translation blindly assuming that all wchars used are supported by the char encoding.. – Oncaphillis Nov 02 '14 at 15:18
  • terminate called after throwing an instance of 'std::runtime_error' what(): locale::facet::_S_create_c_locale name not valid – user1978386 Nov 02 '14 at 15:41
  • how to check locale set in WinXp ? – user1978386 Nov 02 '14 at 15:41
  • Try std::locale("Polish_Poland.852") ... but that is not my domain – Oncaphillis Nov 02 '14 at 15:48
  • "Polish_Poland.852" also doesn't work. In "Regional and Language Options" I have set Polish. I have installed code page 852 (OEM Latin II). How can I display code page name eg. "pl_PL.iso88592" or "Polish_Poland.852" (or maybe it has diffrent name) using command line ? – user1978386 Nov 02 '14 at 15:51
  • May be this http://stackoverflow.com/questions/4406895/what-stdlocale-names-are-available-on-common-windows-compilers helps – Oncaphillis Nov 02 '14 at 15:59
  • After changing "pl_PL.iso88592" to "" in your function the fallownig works: cout << to_string(first) << endl ; cout << to_string(second) << endl ; It outputs Polish characters. However this does not: wifstream fin0(to_string(first).c_str(), ios::binary) ; wifstream fin1(to_string(second).c_str(), ios::binary) ; cout << fin0.is_open() ; 'is_open()' returns 0 if 'first' contains Polish chars – user1978386 Nov 02 '14 at 16:35
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/64126/discussion-between-oncaphillis-and-user1978386). – Oncaphillis Nov 02 '14 at 16:38