4

boost::serialization contains this code:

reinterpret_cast<std::ptrdiff_t>(
    static_cast<Derived *>(
        reinterpret_cast<Base *>(1 << 20)
    )
) - (1 << 20)

Its purpose is to calculate an offset between base and derived classes. Is this code free of undefined behaviour?


The reason why I am asking is that ASAN+UBSAN complain. For example, this code

#include <iostream>

class Foo { public: virtual void foo() {} };
class Base { public: virtual void base() {} };
class Derived: public Foo, public Base {};

int main()
{
    std::cout <<
       (reinterpret_cast<std::ptrdiff_t>(
            static_cast<Derived *>(
                reinterpret_cast<Base *>(1 << 20)
            )
       ) - (1 << 20));
}

compiled as (gcc version 9.2.1)

g++ -fsanitize=address -fsanitize=undefined -fno-omit-frame-pointer -g main.cpp

produces this output

AddressSanitizer:DEADLYSIGNAL
=================================================================
==72613==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000ffff8 (pc 0x0000004012d9 bp 0x7ffd5b3eecf0 sp 0x7ffd5b3eece0 T0)
==72613==The signal is caused by a READ memory access.
    #0 0x4012d8 in main main.cpp:13
    #1 0x7f74a90d5f42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
    #2 0x40112d in _start (/home/.../a.out+0x40112d)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV main.cpp:13 in main

Is it a false-positive or does this code really have problems?


Update 09.12.2019: based on Filipp's proposal and my experiments, this code seems to work and does not produce any warnings:

std::aligned_storage<sizeof(Derived)>::type data;
reinterpret_cast<char*>(&data)
  - reinterpret_cast<char*>(
        static_cast<Base*>(
            reinterpret_cast<Derived*>(&data)));

Does anybody see any problems with this snippet? If not, I would propose it to boost.


Update 16.12.2019: the fix was merged to boost::serialization develop branch.

Kane
  • 5,595
  • 1
  • 18
  • 27
  • 1
    `aligned_storage` is C++11. Does boost serialization require at least that version of C++? If not, I recommend trying another approach. Does ubsan complain if the buffer is not aligned? – Filipp Dec 09 '19 at 14:45
  • Good point, but `boost` has its own `aligned_storage`. I will use it. – Kane Dec 10 '19 at 12:24

2 Answers2

2
reinterpret_cast<Base *>(1 << 20)

This is not a valid pointer.

static_casting it requires evaluating it, which has undefined behaviour.

This is an interesting "trick" but it doesn't seem to be well-defined, and that's borne out by the results of the -fsanitize options.

This does not seem to be unusual for boost::serialization.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
2

As indicated by another answer, the problem is that (1 << 20) isn't the address of any object. Using a char[] that could in principle store a Derived seems to work around the problem:

#include <stdint.h>
#include <stddef.h>
#include <stdio.h>

class Foo { public: virtual void foo() {} };
class Base { public: virtual void base() {} };
class Derived: public Foo, public Base {};

int main() {
    alignas (Derived) char const buffer[sizeof(Derived)] = {};
    Derived const* const derived = reinterpret_cast<Derived const*>(buffer);
    Base const* const base = derived;
    ptrdiff_t const delta =
        reinterpret_cast<char const*>(derived) -
        reinterpret_cast<char const*>(base);
    ::printf("%td\n", delta);
    return 0;
}

Is it a false-positive or does this code really have problems?

According to a strict reading of the standard the code does exhibit UB, so in that sense it is not a false positive. In practice, both boost authors and compiler writers agree this is just pointer math, so it should do the right thing anyway.

Edit: Unless one of the bases involved is virtual. Then the cast would try to read the offset from the vtable.

Edit 2: Using nullptr produces 0. Changed to using local aligned buffer.

Filipp
  • 1,843
  • 12
  • 26
  • 1
    Just checked and the outputs are different. This code prints 0 while the `boost` version prints -8. I think the problem is in `nullptr` and this is the reason why `boost` uses a magic number. – Kane Dec 06 '19 at 20:35
  • _"both boost authors and compiler writers agree this is just pointer math, so it should do the right thing anyway"_ I wouldn't be so sure about that. I keep seeing this assumption, but it doesn't necessarily hold. You're not programming bare metal - you're writing in abstractions. – Lightness Races in Orbit Dec 07 '19 at 14:16
  • Virtual inheritance is one case where code like this is definitely broken. If `Derived` inherits virtually from `Base` and is not `final`, then `static_cast` would try to read the object at that address and fail spectacularly. – Filipp Dec 09 '19 at 14:48