There are many questions on getting the file size of an std::fstream's file, but they all return the file size in bytes and are error prone if the file is open in another stream.
I want to know the file size in codepoints, not bytes.
Now std::fstream::seekg(0,std::ios::end)
followed by std::fstream::tellg()
only returns the length in bytes. This doesn't tell me how many UTF-16/32 characters are in the file. Divide the result by sizeof(wchar_t)
I hear you say. Doesn't work for UTF-8 files and IS NOT portable.
Now, for the more technical minded, I have imbued
the stream with my own std::codecvt
class. std::codecvt
has a member length()
which, given two pointers into the stream calculates the length and returns either max or number of output characters. I would have thought that seeking on the file would seek by codecvt::intern_type
rather than by the base char
type.
I've looked into the fstream
header and found that seek infact doesn't use the codecvt
. And, on my version from VS2010, the codecvt::length()
member is not even mentioned. Infact, on each call to codecvt::in()
, a new string object is created and increased in size by 1 char each time in()
returns partial
. It doesn't instead call the codecvt::max_length()
member and supply the call with an adequate buffer.
Is this just my implementation or can I expect others to do the same? Has std::fstream
been rewritten for VS2012 to make full use of locales?
Basically, I'm fed up of having to write my own file handlers every time I use text files. I'm hoping to create an fstream
derived class that will first read a files BOM, if present, and imbue the correct codecvt
. Then convert those characters to char
, wchar_t
or whatever the code calls for. I'm also hoping to code it in such a way that if prior knowledge of the encoding is known, a locale
can be specified on construction.
Would I be better off working directly on the internal buffer, in affect re-writing the fstream class or are there some tricks I'm unaware of?