3

I need help with the non-copyable nature of [io](f)streams.

I need to provide a hackish wrapper around fstreams in order to handle files with unicode characters in their filenames on Windows. For this, I devised a wrapper function:

bool open_ifstream( istream &stream, const string &filename )
{
#ifdef __GLIBCXX__
    FILE* result = _wfopen( convert_to_utf16(filename).c_str(), L"r" );
    if( result == 0 )
        return false;

    __gnu_cxx::stdio_filebuf<char>* buffer = new __gnu_cxx::stdio_filebuf<char>( result, std::ios_base::in, 1 );
    istream stream2(buffer);
    std::swap(stream, stream2);

#elif defined(_MSC_VER)
    stream.open( convert_to_utf16(filename) );
#endif
    return !!stream;
}

With of course the std::swap line being the culprit. I also tried returning the stream from the function, but it leads to the same problem. The copy constructor of a std::istream is deleted. I also tried a std::move but that didn't help. How do I work around this problem?

EDIT: I finally found a good way to Keep It Simple (TM) and yet functional, thanks to @tibur's idea. It's still hackish in the sense that it depends on the Windows Standard C++ library used, but as there's only two real ones in use, it's not really a problem for me.

#include <fstream>
#include <memory>
#if _WIN32
# if __GLIBCXX__
#  include<ext/stdio_filebuf.h>
unique_ptr<istream> open_ifstream( const string &filename )
{
    FILE* c_file = _wfopen( convert_to_utf16(filename).c_str(), L"r" );
    __gnu_cxx::stdio_filebuf<char>* buffer = new __gnu_cxx::stdio_filebuf<char>( c_file, std::ios_base::in, 1 );

    return std::unique_ptr<istream>( new istream(buffer) );
}
# elif _MSC_VER
unique_ptr<ifstream> open_ifstream( const string &filename )
{
    return unique_ptr<ifstream>(new ifstream( convert_to_utf16(filename)) );
}
# else
# error unknown fstream implementation
# endif
#else
unique_ptr<ifstream> open_ifstream( const string &filename )
{
    return unique_ptr<ifstream>(new ifstream(filename) );
}
#endif

And in user code:

auto stream_ptr( open_ifstream(filename) );
auto &stream = *stream_ptr;
if( !stream )
    return emit_error( "Unable to open nectar file: " + filename );

Which depends on C++0x <memory> and the auto keyword. Of course you can't just close the resulting stream variable, but the GNU Libstdc++ std::istream destructor does take care of closing the file, so no extra memory management is required anywhere.

rubenvb
  • 74,642
  • 33
  • 187
  • 332
  • Why are you trying to shove a UTF-16 string through iostreams? First off, I don't think _wfopen takes a UTF-16 string; I'm fairly sure that, on GCC-based compilers, wchar_t strings are expected to be UTF-32. Since wchar_t is 32-bits long, unlike under Visual Studio where it is 16-bits in size. Second, are you sure you can't just pass them a UTF-8 string? Admittedly, I don't know how GCC's standard C++ library is implemented on Windows, but on UNIX, they take UTF-8 strings. So I would expect them to do the conversion for you, behind the scenes. – Nicol Bolas Jun 29 '11 at 17:40
  • 2
    Why not decouple the filenames from the rest of the program logic and provide a wrapper for Windows that uses `GetShortPathName` -- that way you can treat all filenames uniformly as `char*`. – Kerrek SB Jun 29 '11 at 17:52
  • @Nicol: The UTF-16 thing is how the Win32 API works, GCC on Windows follows this, to be compatible with other native Windows stuff. – rubenvb Jun 29 '11 at 17:55
  • 1
    Don't you need `fclose()`? According to docs "The `FILE*` will not be automatically closed when the stdio_filebuf is closed/destroyed." – marcin Mar 03 '15 at 18:09

4 Answers4

3

Couldn't you just use the rdbuf member function to set stream's buffer directly?

Josh
  • 992
  • 5
  • 5
  • You'd still have to dynamically allocate the `streambuf`, in order for it to persist after the return from the function, and you'd have to figure out some way to ensure that it was deleted when the stream was through with it. – James Kanze Jun 29 '11 at 17:59
  • Right, but I didn't get the impression that part was a problem. I mean, he'd have to do that anyway if his `std::swap` line worked. – Josh Jun 29 '11 at 18:05
  • 1
    The various sources on the internet tell me that the libstdc++ internals will take care of the `__gnu_cxx::stdio_filebuf` objects when you call `close()` on the stream. But the problem with your suggestion is that I need to assign to the pointer returned by `rdbuf`, or at least "reconstruct" the object pointed to, because the `filebuf_type` returned does not have a suitable `open` method :( I need to "hotswap" the buffer object anyway you put it... – rubenvb Jun 29 '11 at 19:25
  • "Hotswap the buffer" is exactly what `rdbuf` does: `stream.rdbuf(buffer)` will discard (and return) the old stream buffer and replace it with whatever you pass in. I've seen people use it to make `cout` print to a file, for example. – Josh Jun 29 '11 at 22:36
3

What about:

ifstream * open_ifstream(const string &filename);
tibur
  • 11,531
  • 2
  • 37
  • 39
2

Here's a moderately unintrusive idea:

#include <iconv.h>
#include <algorithm>

void windowify(std::string & filename)
{
#ifdef WIN32
  assert(filename.length() < 1000);

  wchar_t wbuf[1000];
  char    cbuf[1000];
  char * ip = &cbuf[0];
  char * op = reinterpret_cast<char*>(&wbuf[0]);

  size_t ib = filename.length(), ob = 1000;

  std::fill(cbuf + filename.length(), cbuf + 1000, 0);
  std::copy(filename.begin(), filename.end(), cbuf);

  iconv_t cd = iconv_open("WCHAR_T", "UTF-8");
  iconv(cd, &ip, &ib, &op, &ob);
  iconv_close(cd);

  wchar_t sfnbuf[1000];
  std::fill(cbuf, cbuf + 1000, 0);

  ib = GetShortPathNameW(wbuf, sfnbuf, 1000);
  ob = 1000;
  ip = reinterpret_cast<char*>(&wbuf[0]);
  op = &cbuf[0];

  cd = iconv_open("UTF-8", "WCHAR_T");
  iconv(cd, &ip, &ib, &op, &ob);
  iconv_close(cd);

  filename = std::string(cbuf);
#endif
}

Usage:

std::string filename = getFilename();
windowify(filename);
std::ifstream infile(filename.c_str());
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • One caveat IMO: `GetShortPathName` needs an absolute path, which will take another two calls to `GetFullPathName` and some C string concatenation mess. I'd rather fix my `__gnu_cxx` stuff. – rubenvb Jun 29 '11 at 19:15
  • @Rubenvb: From the [documentation](http://msdn.microsoft.com/en-us/library/aa364989(v=vs.85).aspx): "The path that the lpszLongPath parameter specifies does not have to be a full or long path." I think I've used this successfully on naked filenames and relative paths before... Remember that NTFS filenames don't have to be legal UTF16, but merely zero-terminated 16-bit strings, so you may want to allow for a bit more generality. – Kerrek SB Jun 29 '11 at 20:01
1

I would suggest a small improvement: use _wopen (or _wsopen_s) instead of _wfopen. You will get a file descriptor (int) that you can pass to the stdio_filebuf in place of the the FILE*. In this way you should avoid leaking any resource (as pointed out by marcin)

Alberto M
  • 1,057
  • 8
  • 24