2

I have stumbled on a Valgrind report that I can't manage to fix by myself. I have a function that reads a Microsoft "unicode" string (a series of two-byte aligned wchar_t prefixed by a size) from a file. A sample file might look like this:

0004 0041 0041 0041 0041                 ..A.A.A.A.

The following code sample reads the "unicode" string from the file and uses wcstombs to make a std::string out of it.

#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <string>
#include <iostream>

#include <boost/shared_array.hpp>
#include <boost/cstdint.hpp>

std::string read_unicode_string(FILE* f)
{
    std::wstring s = std::wstring();
    wchar_t c;
    boost::uint16_t size;
    if (2 !=fread(&size, 1, 2, f)) {
        return "";
    }

    // Microsoft's "unicode" strings are word aligned.
    for (unsigned int i = 0 ; i < size ; ++i)
    {
        if (2 != fread(&c, 1, 2, f)) {
            break;
        }
        s += c;
    }
    s += L'\0';

    // Convert the wstring into a string
    boost::shared_array<char> conv = boost::shared_array<char>(new char[s.size() + 1]);
    memset(conv.get(), 0, sizeof(char) * (s.size() + 1));
    wcstombs(conv.get(), s.c_str(), s.size());
    return std::string(conv.get());
}

int main(int argc, char** argv)
{
    FILE* f = fopen("test", "rb");
    if (f == NULL) {
        return 1;
    }
    std::cout << read_unicode_string(f) << std::endl;
    fclose(f);
    return 0;
}

Although it does seem to work, valgrind reports that some jump in wcstombs depends on an initialized value:

==8440== Conditional jump or move depends on uninitialised value(s)
==8440==    at 0x56606C2: wcsnlen (wcsnlen.c:40)
==8440==    by 0x565FCF0: wcsrtombs (wcsrtombs.c:110)
==8440==    by 0x56101A0: wcstombs (wcstombs.c:35)
==8440==    by 0x401488: read_unicode_string(_IO_FILE*) (test.cpp:32)
==8440==    by 0x40157C: main (test.cpp:42)
==8440== 
==8440== Conditional jump or move depends on uninitialised value(s)
==8440==    at 0x55F2D5B: __gconv_transform_internal_ascii (loop.c:332)
==8440==    by 0x565FD41: wcsrtombs (wcsrtombs.c:116)
==8440==    by 0x56101A0: wcstombs (wcstombs.c:35)
==8440==    by 0x401488: read_unicode_string(_IO_FILE*) (test.cpp:32)
==8440==    by 0x40157C: main (test.cpp:42)

I've been looking, but I feel that I have initialized every variable properly. Does anyone see the problem in my code?

Thanks in advance for your help!

executifs
  • 1,138
  • 1
  • 9
  • 23
  • 1
    `wchar_t c = 0` will eliminate the reports on uninitialised value(s). No clue why. –  Mar 13 '14 at 11:20

2 Answers2

2

That is a nasty error! If sizeof(wchar_t) is greater than 2 (lets say 4) the wide string 's' will get (2) uninitialized bytes which are reported as uninitialized value(s) in wcstombs.

oz123
  • 27,559
  • 27
  • 125
  • 187
1

The problem is not in your code. The call stacks show that the uninitialized variable is used somewhere inside the implementation of wcstombs - all you can do is try to tell valgrind not to inspect that library or filter those two messages from valgrind's output.

Arne Mertz
  • 24,171
  • 3
  • 51
  • 90
  • Oh... Okay, I figured that it was probably one of my variables that caused the problem later on in `wcstombs`. Should I look into that function's code and try to submit a patch, then? – executifs Mar 13 '14 at 10:58
  • 1
    @executifs only if it is really a bug. probably it's only a false positive from valgrind or the implementation exploits some well defined cases where the initialization does not matter. I doubt that valgrind will find a real bug in a standard library implementation so easily. – Arne Mertz Mar 13 '14 at 11:15