So, I need to read a unicode file first, then transform it using Huffman's algorithm (effectively compress it) and write it to a new file.
Reason for unicode is special chars like hyphen - the longer dash and other - without unicode, reading and writing using ifstream/ofstream and unsigned char translates the hyphen into 3 individual chars, and when I want to descompress the file, it adds chars that weren't there.
Now, I use std::wifstream and std::wofstream to do this, like so:
size_t bitsNum = 65536;
std::wifstream in("a", std::ios::binary);
std::wofstream out("b", std::ios::binary);
void compress(std::wifstream &in, std::wofstream &out) {
in.clear();
in.seekg(0);
uint64_t size = 0;
for (wchar_t i = 0; i < nodes.size(); ++i) {
size += nodes.at(i).probability * codes.at(nodes.at(i).value).length;
}
std::cout << "Final size: " << size << '\n';
wchar_t c, w = 0, length, lengthW = 0;
std::bitset<bitsNum> bits;
while (!in.eof() && in.good()) {
c = in.get();
bits = codes.at(c).bits;
length = codes.at(c).length;
for (wchar_t i = 0; i < length; ++i) {
if (lengthW == 16) {
lengthW = 0;
out << w;
w = 0;
}
w <<= 1;
w |= bits.test(length - i - 1) & 1;
++lengthW;
}
}
if (lengthW == 16) {
lengthW = 0;
out << w;
w = 0;
}
else if (lengthW) {
w <<= 16 - lengthW;
out << w;
w = 0;
}
out.flush();
if (DECOMPRESS) decompress();
}
The nodes object consists of the frequency distribution for each character that was read from the file, and the codes object consists of bit codes for each of the characters that have to be transformed.
This results in the fact, that I can read a file no problem, but when I write back the new bits, nothing gets written to the file.
I tried imbuing a locale, that did not help, also set a global locale.
Other than piping the wchar_t into the wofstream, I tried to use .put() function and also .write() - no luck here.
Any ideas on what may be wrong?
PS: I am allowed to only use standard c++17 with no extensions.
Thanks!