Boost.Multiprecision cpp_dec_float_50 - convert into an array of bytes and back?

Question

Similar question: to Boost.Multiprecision cpp_int - convert into an array of bytes?

But this time related to floating point values.

export_bits - doesn't seem to have an overload accepting floating point values
limbs of cpp_dec_float_50 are not made public to the outside world

Question: Thus, how should one tackle the problem of converting such a data type, to and back from, an array of bytes?

score 1 · Answer 1 · answered Feb 05 '22 at 18:40

If you don't mind 10 bytes of overhead and don't want to use any undocumented interface, use Serialization support.

Otherwise, "hack" the backend implementation.

Using Serialization

E.g.

Compiler Explorer

#include <boost/archive/binary_oarchive.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <fmt/ranges.h>
#include <sstream>
#include <vector>
#include <span>

using F = boost::multiprecision::cpp_dec_float_50;
namespace ba = boost::archive;

int main() {
    F f{"2837498273489289734982739482398426938568923658926938478923748"};

    std::vector<unsigned char> raw;
    {
        std::ostringstream oss;
        {
            ba::binary_oarchive oa(
                oss, ba::no_header | ba::no_codecvt | ba::no_tracking);

            oa << f;
        }
        auto buf = std::move(oss).str();
        raw.assign(buf.begin(), buf.end());
    }

    fmt::print(" sizeof: {} raw {} bytes {::#0x}\n", sizeof(F), raw.size(),
               std::span(raw.data(), raw.size()));
}

Prints

 sizeof: 56 raw 63 bytes [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xd6, 0x6e, 0x0, 0x0, 0xd1, 0x88, 0xdb, 0x5, 0xba, 0x19, 0xba, 0x1, 0x7, 0x3, 0xa2, 0x1, 0x3a,
 0xe0, 0xdd, 0x5, 0xcd, 0x1b, 0x64, 0x3, 0x88, 0x24, 0x52, 0x5, 0xe4, 0x47, 0xb4, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x38, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x
0, 0xa, 0x0, 0x0, 0x0]

Hacking the Backend

Turns out the related stuff is private. But serialize is generic, so you can use it to exfiltrate the privates:

   template <class Archive>
   void serialize(Archive& ar, const unsigned int /*version*/)
   {
      for (unsigned i = 0; i < data.size(); ++i)
         ar& boost::make_nvp("digit", data[i]);
      ar& boost::make_nvp("exponent", exp);
      ar& boost::make_nvp("sign", neg);
      ar& boost::make_nvp("class-type", fpclass);
      ar& boost::make_nvp("precision", prec_elem);
   }

E.g.: Live Compiler Explorer

//#include <boost/core/demangle.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <fmt/ranges.h>
#include <vector>

using F = boost::multiprecision::cpp_dec_float_50;

struct Hack {
    std::vector<unsigned char> result {};

    template <typename T> Hack& operator&(boost::serialization::nvp<T> const& w) {
        return operator&(w.value());
    }

    template <typename, typename = void> struct Serializable : std::false_type{};
    template <typename T> struct Serializable<T,
                        std::void_t<decltype(std::declval<T>().serialize(
                            std::declval<Hack&>(), 0u))>> : std::true_type {
    };

    template <typename T> Hack& operator&(T const& v)
    {
        if constexpr (Serializable<T>{}) {
            const_cast<T&>(v).serialize(*this, 0u);
        } else {
            constexpr size_t n = sizeof(v);
            //fmt::print("{} ({} bytes)\n", boost::core::demangle(typeid(v).name()), n);
            static_assert(std::is_trivial_v<T>);
            static_assert(std::is_standard_layout_v<T>);
            auto at = result.size();
            result.resize(result.size() + n);
            std::memcpy(result.data() + at, &v, n);
        }
        return *this;
    }
};

int main() {
    F f{"2837498273489289734982739482398426938568923658926938478923748"};

    Hack hack;
    f.serialize(hack, 0u);

    fmt::print(" sizeof: {} raw {} bytes {::#0x}\n", sizeof(F),
            hack.result.size(), hack.result);
}

Prints

 sizeof: 56 raw 53 bytes [0xd6, 0x6e, 0x0, 0x0, 0xd1, 0x88, 0xdb, 0x5, 0xba, 0x19, 0xba, 0x1, 0x7, 0x3, 0xa2, 0x1, 0x3a, 0xe0, 0xdd, 0x5, 0xcd, 0x1b, 0x64, 0x3, 0x88, 0x24, 0x52, 0x5, 0xe4, 0x47, 0xb4, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x38, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xa, 0x0, 0x0, 0x0]

Summary / Caveat

I'll leave the corresponding deserialization code as an exercise for the reader.

In the end, the hack approach turns out to be pretty similar to the clean approach, just mocking the serialization archive.

Note that versioning is not supported in the Hack approach.

Also, portability may not be a given for both approaches. Check whether endianness/processor architectures change your requirements.

Do you know what the leading 10 bytes describe? Are these due to ba::no_header not being honoured? I am after efficiency of storage indeed, thus if I omit these bytes, would I need to put them back when using binary_iarchive to 'deserialize'? Would there always be a prefix of exactly 10 stub bytes? or am I better off 'overriding' the serialization mechanics by inserting the count of stub 0 bytes at the very front and putting that amount back when deserializing , just to 'sleep safe'? — Vega4, Feb 06 '22 at 07:52
What is your use case, because correct answers fully depend. You could be well served with text serialization, even. It matters a lot what the distribution of your data is, what the volume of it will be, what edge requirements (portability, processing speed etc etc). — sehe, Feb 06 '22 at 11:43

score 0 · Answer 2 · answered Feb 06 '22 at 10:17

This answer is based on what @Sehe provided.

It provides facilities for both serialization and deserialization of Boost's mp::cpp_dec_float_50.

Since no_header flag does not seem to be honoured by Boost's serialization interface - around 10-long byte prefix-crap is inserted when interacting through the Serialization interface and since I have no idea what these bytes are supposed to represent - all the non-significant bytes are omitted and the number of these is stored within the least-significant byte of the serialization product. Thus, what follows should be compatible across versions of Boost, should Boost one day decide to handle the flag properly.

These stub bytes are then 'recovered' when de-serializing.

Enjoy.

using BigFloat = mp::cpp_dec_float_50;
/// <summary>
/// Produces a vector of bytes from mp::cpp_dec_float_50 (BigFloat).
/// Leading header is omitted for storage efficiency.
/// </summary>
/// <param name="f"></param>
/// <returns></returns>
std::vector<uint8_t>  CTools::BigFloatToBytes(BigFloat const& f)
{

    std::vector<unsigned char> raw;
    {
        std::ostringstream oss;
        {
            boost::archive::binary_oarchive oa(
                oss, boost::archive::no_header | boost::archive::no_codecvt | boost::archive::no_tracking);

            oa << f;
        }

        auto buf = std::move(oss).str();
        raw.assign(buf.begin(), buf.end());

        uint8_t leading0sCount = 0; //it will be stored within the last byte

        for (int i = 0; i < raw.size(); i++)
        {
            if (raw[i] == 0)
            {
                leading0sCount++;
            }
            else
                break;
        }

        raw.assign(raw.begin() + leading0sCount, raw.end());
        raw.push_back(leading0sCount);
    }

    return raw;
}

/// <summary>
/// Instantiates BigFloat (mp::cpp_dec_float_50) from a vector of bytes.
/// </summary>
/// <param name="v"></param>
/// <returns></returns>
BigFloat CTools::BytesToBigFloat(std::vector<uint8_t> v)
{
    //Local Variables and Namespaces - BEGIN
    namespace io = boost::iostreams;
    namespace ba = boost::archive;
    //Local Variables and Namespaces - END

    //Validation - BEGIN
    if (v.size() == 0)
        return 0;
    //Validation - END

    //Operational Logic - BEGIN

    //recover leading 0s/prefix
    uint8_t leading0sCount = v[v.size() - 1];
    v.pop_back();
    std::vector<uint8_t> prefix = std::vector<uint8_t>(leading0sCount);
    v.insert(v.begin(), prefix.begin(), prefix.end());

    io::stream_buffer<io::back_insert_device<std::vector<uint8_t>>> bb(v);

    BigFloat i;
    {
        std::vector<char> chars { v.begin(), v.end() };
        io::stream_buffer<io::array_source> bb(chars.data(), chars.size());
        boost::archive::binary_iarchive ia(bb, ba::no_header | ba::no_tracking | ba::no_codecvt);
        ia >> i;
    }
    //Operational Logic - END

    return i;
}

Oof. I'd recommend the [hack approach](https://stackoverflow.com/questions/70997737/boost-multiprecision-cpp-dec-float-50-convert-into-an-array-of-bytes-and-back/71001035?noredirect=1#:~:text=0x0%2C%200x0%2C%200x0%5D-,Hacking%20the%20Backend,-Turns%20out%20the) instead. Of course "no_header" is honoured, did you [try without](https://imgur.com/a/VtG2J0B)? It seems a bit childish to dunk on any feature you don't like/understand. Nothing stops you from reading up, it's open source, even. — sehe, Feb 06 '22 at 11:34
However, the real danger is in assuming the leading data is "insignificant bytes" (that's now what that means) and can be replaced by zeroes (that's unspecified, because _you don't know what they mean_). Similarly, my Hack approach makes a (smaller) set of assumptions but at least: 1. it is explicit about those 2. they can be checked at compile time to a degree. In short: co-operating with the undocumented implementation details to jointly arrive at a desired result is MUCH safer than just blindly putting a knife in someone else's end-product. Caveat emptor. — sehe, Feb 06 '22 at 11:35

Boost.Multiprecision cpp_dec_float_50 - convert into an array of bytes and back?

2 Answers2

Using Serialization

Hacking the Backend

Summary / Caveat