0

I am trying to write a program that can take a list of a specific type for example a double or float and convert it into bytes and write it to a file that can be converted back into that original list. I came up with the following structs and functions. It works for a float list, however, it does not if it is a list of doubles and I am trying to understand why.

template<typename T>
struct writer{
    static constexpr size_t Size = sizeof(T); 
    //the size to determing if it needs to be converted to uint16_t, uint32_t, etc...
    static constexpr bool U = std::conditional<std::is_unsigned_v<T>, std::true_type, 
          typename std::conditional<std::is_same_v<T, float>, std::true_type,
          typename std::conditional<std::is_same_v<T, double>, std::true_type, std::false_type>::type >::type >::type::value;
    //this is used to determine if the storing_num variable needs to be stored as unsigned or not
    using value_t = std::conditional_t<sizeof(T) == 1, 
          std::conditional_t<U, uint8_t, int8_t>,
          std::conditional_t<sizeof(T) == 2,
          std::conditional_t<U, uint16_t, int16_t>,
          std::conditional_t<sizeof(T) == 4,
          std::conditional_t<U, uint32_t, int32_t>, //by default the only options are 1, 2, 4, and 8
          std::conditional_t<U, uint64_t, int64_t> > > >;
    //the value that will either be entered or bit_casted to (shown in convert_num function)
    value_t storing_num;
    using my_byte = std::conditional_t<U == true, uint8_t, int8_t>;
    std::array<my_byte, Size> _arr;
    bool convert_num(T inp){
        static_assert(sizeof(T) == sizeof(value_t), "T and value_t need to be the same size");
        if constexpr (!std::is_same_v<T, value_t>){
            storing_num = std::bit_cast<value_t>(inp);
        }else{
            storing_num = inp;
        }

        auto begin = _arr.begin();
        for(int32_t i = _arr.size() - 1; i >= 0; --i, ++begin){
            *begin = ((storing_num >> (i << 3)) & 0xFF);
        }
        return true;
    }
    bool write(std::ostream& outfile){
        auto begin = _arr.cbegin();
        auto end = _arr.cend();
        for(;begin != end; ++begin)
            outfile << (char)(*begin);
        return true;
    }

};

The following can be used to write a float or uint32_t to a text file successfully. The following can be used to read one of the numbers back in:

template<typename T>
struct reader{
    static constexpr size_t Size = sizeof(T);
    static constexpr bool U = std::conditional<std::is_unsigned_v<T>, std::true_type, 
          typename std::conditional<std::is_same_v<T, float>, std::true_type,
          typename std::conditional<std::is_same_v<T, double>, std::true_type, std::false_type>::type >::type >::type::value;
    using value_t = std::conditional_t<sizeof(T) == 1, 
          std::conditional_t<U, uint8_t, int8_t>,
          std::conditional_t<sizeof(T) == 2,
          std::conditional_t<U, uint16_t, int16_t>,
          std::conditional_t<sizeof(T) == 4,
          std::conditional_t<U, uint32_t, int32_t>, //by default the only options are 1, 2, 4, and 8
          std::conditional_t<U, uint64_t, int64_t> > > >;
    value_t outp;
    std::array<int8_t, Size> _arr;
    bool add_nums(std::ifstream& in){
        static_assert(sizeof(T) == sizeof(value_t), "T and value_t need to be the same size");
        _arr[0] = in.get();
        if(_arr[0] == -1)
            return false;
        for(uint32_t i = 1; i < _arr.size(); ++i){
            _arr[i] = in.get();
        }
        return true;
    }
    bool convert(){
        if(std::any_of(_arr.cbegin(), _arr.cend(), [](int v){return v == -1;}))
            return false;
        outp = 0;
        if(U){
            auto begin = _arr.cbegin();
            for(int32_t i = _arr.size()-1; i >= 0; i--, ++begin){
                outp += ((uint8_t)(*begin) << (i * 8));
            }
            return true;
        }
        auto begin = _arr.cbegin();
        for(int32_t i = _arr.size() - 1; i >= 0; --i, ++begin)
            outp += ((*begin) << (i << 3));
        return true;
    }
};

Then I use the following functions to iterate over a text file and read/write to said file:

template<typename T>
void read_list(T* begin, const char* filename){
    reader<T> my_reader;
    std::ifstream in(filename);
    if(in.is_open()){
        while(in.good()){
            if(!my_reader.add_nums(in))
                break;
            if(!my_reader.convert()){
                std::cerr << "error reading, got -1 from num reading " << filename;
                return;
            }
            if(std::is_same_v<T, typename reader<T>::value_t >) *begin = my_reader.outp;
            else *begin = std::bit_cast<T>(my_reader.outp);
            ++begin;

        }
    }
    if(!in.eof() && in.fail()){
        std::cerr << "error reading " << filename;
        return;
    }
    in.close();
    return; 
}

template<typename T>
void write_list(T* begin, T* end, const char* filename){
    writer<T> my_writer;
    std::ofstream outfile(filename, std::ios::out | std::ios::binary | std::ios::trunc);
    for(;begin != end; ++begin){
        my_writer.convert_num(*begin);
        my_writer.write(outfile);
    }
}

For example, the following would work as expected:

void write_float_vector(){
    std::vector<float> my_floats = {4.981, 832.991, 33.5, 889.56, 99.8191232, 88.192};
    std::cout<<"my_floats: " << my_floats<<std::endl;
    write_list(&my_floats[0], &my_floats[my_floats.size()], "binary_save/float_try.nt");
}

void read_floats(){
    std::vector<float> my_floats(6);
    read_list(&my_floats[0], "binary_save/float_try.nt");
    std::cout<<"my_floats: " << my_floats<<std::endl;
}

int main(){
    write_double_vector();
    std::cout<<"reading..."<<std::endl;
    read_doubles();
}

However, if it was converted to doubles instead of floats it fails to read the doubles back in correctly. Why is it failing with doubles?

For example, the following fails based on what is outputted from the read_doubles function:

void write_double_vector(){
    std::vector<double> my_doubles = {4.981, 832.991, 33.5, 889.56, 99.8191232, 88.192};
    std::cout<<"my_doubles: " << my_doubles<<std::endl;
    write_list(&my_doubles[0], &my_doubles[my_doubles.size()], "binary_save/double_try.nt");
}

void read_doubles(){
    std::vector<double> my_doubles(6);
    read_list(&my_doubles[0], "binary_save/double_try.nt");
    std::cout<<"my_doubles: " << my_doubles<<std::endl;
}

Additional

If you would like to run the code yourself, I added these helper functions, and used the following headers to make it easier to reproduce:

#include <cstddef>
#include <cstdint>
#include <ios>
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <vector>
#include <array>
#include <bit>


template<typename T>
std::ostream& operator<<(std::ostream& os, const std::vector<T>& v){
    os << "{";
    for(uint32_t i = 0; i < v.size()-1; ++i)
        os << v[i]<<',';
    os << v.back() << "}";
    return os;
}

Sam Moldenha
  • 463
  • 2
  • 11
  • Sure they can, but it's then not a "text file". – Ted Lyngmo Aug 13 '23 at 21:45
  • What are all the headers like `` for? What do they provide that `` doesn't already provide? – Ted Lyngmo Aug 13 '23 at 21:46
  • @TedLyngmo I think so, honestly, I am using nvim as my IDE and it automatically adds those headers when I use those types while coding, I pretty much just copied and pasted. Ill delete it. – Sam Moldenha Aug 13 '23 at 21:47
  • `#include ` is another of those you don't need. `bool` is a built-in type and `` should be `` if you even need it. Your code doesn't seem to use it. – Ted Lyngmo Aug 13 '23 at 21:48
  • 1
    BTW, not sure that `float`/`double` representation is fixed. – Jarod42 Aug 13 '23 at 21:51
  • And you should never build a `float` or `double` character-by-character like you seem to do in `reader`. You may create a trap representation. Read `sizeof(double)` `char`s into a buffer and then `std::memcpy` that buffer into the `double`. Also, what is `if (_arr[0] == -1) return false;` supposed to do? – Ted Lyngmo Aug 13 '23 at 21:59
  • 1
    For anyone who wants to try and make sense of these code snippets, I massaged it into a [mre] that doesn't rely on writing files: https://godbolt.org/z/esEE8c13G – paddy Aug 13 '23 at 22:02
  • In `reader` try these replacement functions: `void add_nums(std::istream& in) { in.read(_arr.data(), Size); }` and `void convert() { std::memcpy(&outp, _arr.data(), Size); }`. I'm not sure why they are two separate functions though, but that should make it read a `double` from any `istream` (like an `ifstream`). – Ted Lyngmo Aug 13 '23 at 22:05
  • @TedLyngmo They are separate functions because I wanted to handle if there was a wrong number of bytes put into the file for reading, and that was a simple way to do it. – Sam Moldenha Aug 13 '23 at 22:15

3 Answers3

4

Building a float or double character-by-character can cause a trap representation so don't do that. Replace the _arr definition with std::array<char, Size> _arr; and then use in.read + std::memcpy:

Example:

#include <cstring> // std::memcpy
// writer:
std::array<char, Size> _arr;

bool convert_num(T inp) {
    static_assert(sizeof(T) == sizeof(value_t),
                    "T and value_t need to be the same size");
    std::memcpy(_arr.data(), &inp, Size);
    return true;
}

bool write(std::ostream& outfile) {
    return static_cast<bool>(outfile.write(_arr.data(), Size));
}
// reader:
std::array<char, Size> _arr;

bool add_nums(std::istream& in) {
    return static_cast<bool>(in.read(_arr.data(), Size));
}

bool convert() {
    std::memcpy(&outp, _arr.data(), Size);
    return true;
}

Demo (originally from Paddy)

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • The following does not work for doubles, and makes the float case no longer work as well. – Sam Moldenha Aug 13 '23 at 22:16
  • @SamMoldenha It works if you write them the same way. I added that to the answer. The above are the only changes I needed to make in order to make Paddys test cases work. – Ted Lyngmo Aug 13 '23 at 22:20
  • So much cleaner. One point though is that I noticed the original code stores data in reversed byte order. If that's an actual requirement, then memcpy won't help so much. And if this is used for storing integers, then there are the usual cross-platform endianness concerns to worry about. – paddy Aug 13 '23 at 22:37
  • @paddy That is true. For endianess, one would have to add a check if the buffer is to be reversed before reading/writing. The most important thing is to use `std::memcpy` when populating `double`s (and `float`s). I've seen _not_ doing it causing all sorts of problems in the past. – Ted Lyngmo Aug 13 '23 at 22:40
2

The main issue with your program is the shifting of a smaller data type when you read. You have this in two places (shown together here for clarity, despite being out of context):

outp += ((uint8_t)(*begin) << (i * 8));
outp += ((*begin) << (i << 3));

In both these cases, the iterator begin references an underlying type of int8_t. Not only is this a signed type, but you're placing trust in the compiler to promote the type to a large enough integer. I actually ran into a similar issue some years ago.

See here: Undefined high-order of uint64_t while shifting and masking 32-bit values

The fix is to cast to the correct size before shifting:

outp += (((value_t)(uint8_t)*begin) << (i * 8));
outp += (((value_t)(uint8_t)*begin) << (i << 3));

I would also like to suggest that you avoid using constructs like i << 3 instead of i * 8. There is simply no need to try and be clever. The compiler will do the right thing, and you're just making the code harder to read. It also lacks consistency, since you're using both forms in the same function.

paddy
  • 60,864
  • 6
  • 61
  • 103
1

The only correct (not UB) and constexpr way to do it in modern era(C++20) is std::bit_cast:

#include <bit>
#include <array>
#include <algorithm>


double x;
auto x_bytes = std::bit_cast<std::array<std::unit8_t, sizeof(x)>(x);

if (std::endian::native==std::endian::little)
    std::ranges::reverse(x_bytes);

auto y = bit_cast<double>(x_bytes);

C++ standard has eliminated all other options. They may work on sime platforms; but being UB, they may break under unspecified conditions and lead to surprise results. std::bit_cast works if the input and output are of same size and trivially copyable. The other options include reinterpret_cast which creates problems with strict aliasing rule, std::memcpy which is source of bugs due to programmer mistakes, and union based type punning which has the worst of both previous ones. std::bit_cast has also the constexpr property, which means it can be used in expressions that need be evaluated at compile-time(create a none-type template parameter, or element count of an array...).

Red.Wave
  • 2,790
  • 11
  • 17
  • 1
    I like this so +1 although _"eliminated all other options"_ may be a bit strong. `memcpy` still works just fine. Perhaps there should also be a `static_assert(std::endian::native==std::endian::big || std::endian::native==std::endian::little);` just to make sure it's not used in a mixed endian environment. – Ted Lyngmo Aug 14 '23 at 20:32
  • `memcpy` is C API; it is easy to mix source and destination, or get overlapping meemory ranges that lead to UB. Compiler does use it behind the scene with compile-time proof of safety. But it is not good in user code. The focus of C++ is shifting towards compile-time evaluation. So, regardless of any present caveats, none `constexpr` solutions such as `memcpy` are not 1st rank choice. I can edit the post, but I need to see a single option other than `memcpy`, `union` or `reinterpreted_cast`. You can Google for `memmove` vs `memcpy` to check for the damage due to irresponsible use of the later. – Red.Wave Aug 15 '23 at 13:07
  • Endian is a side note. Generlly we need to do nest to host or host to nest conversions when streaming bytes and bit over secondary storage or network. Otherwise mismatched ordering causes problems. I did not intend to draw the focus on endian-ness however. – Red.Wave Aug 15 '23 at 13:14
  • I agree with everything except the _"eliminated_" part, but that minor detail is not worth changing (at least not for my sake). – Ted Lyngmo Aug 15 '23 at 14:24