3

The question is, does the following snippet use uninitialized memory, as reported by Google's MemorySanitizer? Or is it a false positive?:

  • main.cpp:
#include <string>
#include <iostream>

using namespace std;

int main() {
    string s0 = to_string(1);
    cout << "s0: " << s0 << endl;
    string s1 = to_string(1) + to_string(2);
    cout << "s1: " << s1 << endl;
    return 0;
}
  • Makefile:
main:
    clang++ -fsanitize=memory -fsanitize-memory-track-origins -fPIE -pie -fno-omit-frame-pointer -g -O2 main.cpp -o main-msan.out
    clang++ -O2 main.cpp -o main.out

Result:

./main-msan.out 
s0: 1
==122092==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55a7354e5cf7 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/basic_string.h:6123:34
    #1 0x55a7354e5cf7 in main <my_directory>/msan/main.cpp:9:30
    #2 0x7f201f6edd09 in __libc_start_main csu/../csu/libc-start.c:308:16
    #3 0x55a735468349 in _start (<my_directory>/msan/main-msan.out+0x21349)

  Uninitialized value was created by an allocation of 'ref.tmp' in the stack frame of function 'main'
    #0 0x55a7354e4d90 in main <my_directory>/msan/main.cpp:6

SUMMARY: MemorySanitizer: use-of-uninitialized-value /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/basic_string.h:6123:34 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&)
Exiting

A mirror issue is also opened here

D.J. Elkind
  • 367
  • 2
  • 8
  • 2
    It's perfectly valid code. A standard library implementation may employ optimization techniques that trip the sanitizer. But that doesn't automatically mean the implementation is buggy. It could be as simple as forgetting to apply an annotation in library code, for the sanitizer to ignore the "issue". – StoryTeller - Unslander Monica Feb 22 '23 at 04:24
  • There's short string optimization for such short strings. I am not sure how it is implemented in GCC. But it is not unreasonable that some bytes were not initialized in the strings as they are of size <= 2 and then copying might address these uninitialized values. – ALX23z Feb 22 '23 at 04:26
  • @ALX23z actually I tried something longer, like `string s1 = to_string(111) + to_string(222);`, it still triggers the complaint. Also for MemorySanitizer to work I have to use `clang++`. – D.J. Elkind Feb 22 '23 at 05:01
  • @StoryTeller-UnslanderMonica this is another point I am thinking. Say I have the following: `uint32_t a, b; uint32_t c = a + b; cout << c << endl;` My understanding is that this code is valid and it does not invoke any UB as `unsigned int` never overflows. Admittedly the value of `c` could be implementation-defined or indeterminate--but it should work fine if, somehow, I just need a value, but doesnt care what the value is. – D.J. Elkind Feb 22 '23 at 05:02
  • This is the sticky point. As far as the standard is concerned, that code has undefined behavior because it uses and produces indeterminate values in types that are not exempt from the UB. On the other hand, you are correct that modern hardware typically is ok with it, and does benign things. The sanitizer adheres to the standard's view, while a standard library implementer may rely on the hardware knowledge to spare us cycles. Hence the clash between them. – StoryTeller - Unslander Monica Feb 22 '23 at 05:14
  • @StoryTeller-UnslanderMonica, actually I argued this with another guy before but failed to reach a conclusion (unfortunately I fail to find the post). I am checking C17/C18 standard [here](https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf). I believe `uint32_t a, b; uint32_t c = a + b;` makes `c` an "unspecified value" as defined in section 3.19.3. But where does the Standard say this triggers UB? – D.J. Elkind Feb 22 '23 at 06:07
  • 1
    Since this is C++, the relevant section is https://timsong-cpp.github.io/cppwp/n4868/basic.indet#2 - it applies to all types (except `unsigned char` or `std::byte`) when making a blanket statement about there being UB. I don't remember where exactly the C standard said it, but I recall seeing verbiage to that effect in C11. – StoryTeller - Unslander Monica Feb 22 '23 at 06:20
  • if I get it right, while `uint32_t a, uint32_t c = a;` does trigger UB, `uint8_t a; uint8_t c = a;` does **not**? – D.J. Elkind Feb 22 '23 at 06:31
  • It doesn't matter that the string size is 3 instead of 1. It is still a very small number, so SSO applies. All modern compilers use SSO. Strings need to be longer than 32 or something like that, depending on implementation. – ALX23z Feb 22 '23 at 07:58

1 Answers1

0

A mirror issue to this question received an answer. The gist of the answer is that MemorySanitizer is not a "plug-in"-style checker.

To be specific, say we have a source code file main.cpp and we want to compile it to binary main.out, simply adding -fsanitize=memory -fsanitize-memory-track-origins to my compilation command is not enough. To avoid false positives, all the libraries used by main.out must also be compiled with -fsanitize=memory -fsanitize-memory-track-origins as well. In the answer, this is called "instrumentation".

This official wiki post describes how we can achieve it.

D.J. Elkind
  • 367
  • 2
  • 8