0

It bugged me that std::to_string doesn't allow for custom allocators, so I'm writing my own implementation. For this it would be beneficial to know beforehand how many digits I need to allocate string space for. I could do it in multiple ways:

Use a for loop like demonstrated here:

int length = 1;
int x = 234567545;
while (x /= 10)
   length++;

Use base 10 logarithm + 1:

uint32_t x{234567};
double ds = std::log10(static_cast<double>(x)) + 1;
int digits = static_cast<int>(ds);

.. maybe other solutions.

Here's my code:

Demo

#include <concepts>
#include <cstdio>
#include <string>
#include <memory_resource>
#include <cinttypes>

using allocator_t = std::pmr::polymorphic_allocator<std::byte>;

template <std::integral T>
inline auto to_string(T number, allocator_t allocator = {}) -> std::pmr::string {
    // const std::size_t size = ???
    std::pmr::string str{ size, '\0', allocator };
    if constexpr(std::same_as<T, uint32_t>) {
        std::snprintf(&str.front(), size, "%" PRIu32, number);
    } else if constexpr (std::same_as<T, uint16_t>) {
        std::snprintf(&str.front(), size, "%" PRIu16, number);
    } else if constexpr (std::same_as<T, uint8_t>) {
        std::snprintf(&str.front(), size, "%" PRIu8, number);
    }
    // ...
    return str;
}

int main()
{
    uint32_t x = 256;
    printf("My number = %s\n", to_string(x).data());
}

The question is: What is the most efficient and robust way to get the number of digits of an integral number for this use-case?

glades
  • 3,778
  • 1
  • 12
  • 34
  • 1
    The absolute simplest and quickest way? Since you know the type you know the max number of digits possible for that type, so you can simply use that for the string creation. Then you could use [`shrink_to_fit`](https://en.cppreference.com/w/cpp/string/basic_string/shrink_to_fit) to remove the little excess memory there is. – Some programmer dude Feb 23 '23 at 09:40
  • @Someprogrammerdude As I understand it shrink_to_fit just reduces capacity, not size. The size will still be the max number of digits filled. So I would have to run over the string and use resize at the first zero terminator. – glades Feb 23 '23 at 09:44
  • That's what I meant. Set the capacity of the string, which does a single memory allocation, then add digits to the string as you did before but don't worry about string reallocations since that won't happen. Then if the size (number of actual digits) is smaller than the capacity use `shrink_to_fit` to remove excess "waste" of memory (though for your case it will likely be minimal). – Some programmer dude Feb 23 '23 at 09:49
  • On another and unrelated note, I really recommend you use *specialization* instead of the `if constexpr` chain. Specialization will give you much more flexibility while at the same time reducing complexity. – Some programmer dude Feb 23 '23 at 09:50
  • @Someprogrammerdude Good tip, thx! Say I'm using sprintf, the buffer would have to be preallocated, how would you insert the digits one by one in this case? – glades Feb 23 '23 at 09:52
  • 1
    I would recommend to benchmark several solutions, not guess what will be faster. It may be very surprising to you. In particular, sometimes calling `reserve()` on a string can be slower than calling several `+=` operations on default constructed string, however counterintuitive it is... – sklott Feb 23 '23 at 10:06
  • Changed my comment because you use `snprintf`. :) But a one-by-one solution would just use plain simple decimal arithmetic (modulo and division) in a loop to get the digits, and append them to the string. The main problem with that is that it's usually done from the smallest to the largest digit, which means the digits will be reversed in the string. But since you know the type and the number of possible digits, you can always start with the biggest and don't add it if it's zero. – Some programmer dude Feb 23 '23 at 10:07
  • @sklott That's not counterintuitive once you take into account the small string buffer of std::string. As long as you don't push more than 15 characters it should be blazingly fast. – glades Feb 23 '23 at 10:30
  • 1
    @Someprogrammerdude I just noticed that snprintf will actually return the number of written bytes - which is convenient – glades Feb 23 '23 at 10:46
  • It's actually *more* than convenient, `snprintf` with a null destination and a zero size is commonly used to get the length of the resulting string to be able to allocate it dynamically. *But* it of course needs to do the formatting twice, which depending on the format might be expensive. As mentioned by @sklott, profile and benchmark (optimized build) to find out what is best for your use-case. If it's even needed to begin with, unless you have specific requirements it's often not needed. Optimizations tend to make code more complex and less maintainable. – Some programmer dude Feb 23 '23 at 10:55
  • @Someprogrammerdude true. I just noticed my other problem is that the number of digits in int is not portable - i would have to get the digits anyway. But I could do it in a constexpr loop – glades Feb 23 '23 at 11:03

2 Answers2

3

I played around with a few options. Calculating the log base 10 was ridiculously slow (30000+ cycles). The first simple loop (while (x /= 10)) you posted was pretty hard to beat, but here's an option which appears to be a bit faster for inputs with several and similar in performance for those with only a few. The idea behind it was that subtraction/comparison is supposed to be much faster than division.

static inline uint8_t uint32digits(const uint32_t input) {
  uint8_t length = 1;
  length += (input >= 1000000000);
  length += (input >= 100000000);
  length += (input >= 10000000);
  length += (input >= 1000000);
  length += (input >= 100000);
  length += (input >= 10000);
  length += (input >= 1000);
  length += (input >= 100);
  length += (input >= 10);
  return length;
}

I counted machine cycles, which I know isn't a great way to benchmark, but it's simple and doesn't fall into the trap of measuring overly optimised code. For single digit numbers I got 104 vs 72 cycles, for 5 digit numbers I got 96 vs 176 cycles, and for 10 digit numbers 96 vs 240 cycles. As expected, my function's cost is pretty independent of the input.

EDIT: Actually, this appears to be a little faster still...

static inline uint8_t mynumdigits(const uint32_t input) {
  return 1 + (input >= 1000000000) + (input >= 100000000) + (input >= 10000000) +
         (input >= 1000000) + (input >= 100000) + (input >= 10000) +
         (input >= 1000) + (input >= 100) + (input >= 10);
}

Here's a function template for dealing with all integral types. The only branching here occurs if T is some future type with more than 64 bits or if a signed type is used.

#include <limits>
#include <type_traits>

template <class T>
    requires std::is_unsigned_v<T>
constexpr static inline int intdigits(const T input) {
    int length = 1 + (input >= 10u) + (input >= 100u);

    if constexpr (std::numeric_limits<T>::max() > 0xFF) {
        // T is more than 8 bits
        length += (input >= 1000u) + (input >= 10000u);

        if constexpr (std::numeric_limits<T>::max() > 0xFFFF) {
            // T is more than 16 bits
            length +=
                (input >=       100'000u) +
                (input >=     1'000'000u) +
                (input >=    10'000'000u) +
                (input >=   100'000'000u) +
                (input >= 1'000'000'000u);

            if constexpr (std::numeric_limits<T>::max() > 0xFFFF'FFFF) {
                // T is more than 32 bits
                length +=
                    (input >=             10'000'000'000u) +
                    (input >=            100'000'000'000u) +
                    (input >=          1'000'000'000'000u) +
                    (input >=         10'000'000'000'000u) +
                    (input >=        100'000'000'000'000u) +
                    (input >=      1'000'000'000'000'000u) +
                    (input >=     10'000'000'000'000'000u) +
                    (input >=    100'000'000'000'000'000u) +
                    (input >=  1'000'000'000'000'000'000u) +
                    (input >= 10'000'000'000'000'000'000u);

                if constexpr (std::numeric_limits<T>::max() >
                              0xFFFF'FFFF'FFFF'FFFF)
                {   // T is more than 64 bits - future proofing.
                    // Make the recursive call unconditionally to make
                    // it branchless: 
                    if (input > 0xFFFF'FFFF'FFFF'FFFF)
                        return length - 1 + intdigits(
                            input / 10'000'000'000'000'000'000u);
                }
            }
        }
    }
    return length;
}

template <class T>
    requires std::is_signed_v<T>
constexpr static inline int intdigits(const T input) {
    return intdigits(static_cast<std::make_unsigned_t<T>>(std::abs(input)));
}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
Simon Goater
  • 759
  • 1
  • 1
  • 7
  • I knew it! Log is slow. I think your approach works pretty well for runtime calculation of uint32_t. For more generic types you would have to add x amount of branches though. – glades Feb 23 '23 at 12:26
  • @glades I added a generic version dealing with all integral types without branching. – Ted Lyngmo Feb 27 '23 at 16:24
  • @TedLyngmo Nice but when it's constexpr anyway what do you gain? – glades Feb 28 '23 at 09:26
  • @glades It can be used in `constexpr` situations just like your original while loop, but if Simon's measurements are correct, it will be faster if used in runtime (and presumably then also faster to compile). – Ted Lyngmo Feb 28 '23 at 09:38
  • @TedLyngmo That I don't understand. Aren't constexpr functions always faster? I mean the whole func boils down to a single number ,so it just needs to copy the value from RAM basically – glades Feb 28 '23 at 11:03
  • Both your and Simon's function to calculate the number of digits can be `constexpr` but that doesn't mean that `constexpr` is faster than a non-`constexpr` function. If a `constexpr` function is used with a runtime input, it will need to do the calculation in runtime. If used with a `constexpr` input and the return value is used where a `constexpr` is needed, it will be evaluated at compile time - so then it is a matter of how fast it can compile to get the result. Since `constexpr` functions _can_ be used with runtime values (unlike `consteval` functions), it doesn't hurt to make them fast. – Ted Lyngmo Feb 28 '23 at 11:39
  • @glades Your `get_max_digits` below could however be `consteval` since it won't ever be used with runtime values. If could possibly be faster to compile using Simon's approach though. One whould have to test. – Ted Lyngmo Feb 28 '23 at 11:43
  • I'm not a c++ programmer, so I'll let you c++ guys debate this one. As a c programmer, I like to know the size of my integers at programming time so I use stdint.h and the various typedefs it provides. Then it's up to the programmer to call the correct function for the argument type. Simples... – Simon Goater Feb 28 '23 at 11:46
1

After some tinkering I came up with the following solution:

Demo

#include <concepts>
#include <cstdio>
#include <string>
#include <memory_resource>
#include <limits>
#include <type_traits>

using allocator_t = std::pmr::polymorphic_allocator<std::byte>;

template <std::integral T>
constexpr std::size_t get_max_digits()
{
    T val = std::numeric_limits<T>::max();
    std::size_t cnt=1;
    while (val /= 10) {
        ++cnt;
    }
    if constexpr (std::unsigned_integral<T>) {
        return cnt;
    } else {
        return cnt+1;
    }
}

template <typename T> struct PRI{};
template<> struct PRI<signed char> { static constexpr const char* value = "%hhd"; };
template<> struct PRI<unsigned char> { static constexpr const char* value = "%hhu"; };
template<> struct PRI<char> : std::conditional_t<std::is_signed_v<char>, PRI<signed char>, PRI<unsigned char>>{};
template<> struct PRI<short> { static constexpr const char* value = "%hd"; };
template<> struct PRI<unsigned short> { static constexpr const char* value = "%hu"; };
template<> struct PRI<int> { static constexpr const char* value = "%d"; };
template<> struct PRI<unsigned int> { static constexpr const char* value = "%u"; };
template<> struct PRI<long> { static constexpr const char* value = "%ld"; };
template<> struct PRI<unsigned long> { static constexpr const char* value = "%lu"; };
template<> struct PRI<long long> { static constexpr const char* value = "%lld"; };
template<> struct PRI<unsigned long long> { static constexpr const char* value = "%llu"; };

template <typename T>
inline constexpr const char* PRI_v = PRI<T>::value;

template <std::integral T>
inline auto to_string(T number, allocator_t allocator = {}) -> std::pmr::string {
    constexpr std::size_t size = get_max_digits<T>();
    std::pmr::string str{ size, '\0', allocator };
    const std::size_t written_s = std::snprintf(&str.front(), size+1, PRI_v<T>, number);
    str.resize(written_s);
    return str;
}

int main()
{
    int x = -256325651;
    char ch = 'A';
    printf("Integer as string = %s\n", to_string(x).c_str());
    printf("char as string = %s\n", to_string(ch).c_str());
}

Output:

Integer as string = -256325651
char as string = 65

This will work for all major platforms (tested on x86, RISC-V and ARM) and for all common integral types. I decided allocate for the maximum number of digits and downsize to the actually used digits later. This will give me only one allocation syscall and one call to snprintf while the maxdigits calculation can be done at compile time. It should be reasonably fast.

glades
  • 3,778
  • 1
  • 12
  • 34