1

I have a situation in which I'm performing binary serialization of a some items and I'm writing them to an opaque byte buffer:

int SerializeToBuffer(unsigned char* buffer)
{
    stringstream ss;
    vector<Serializable> items = GetSerializables();
    string serializedItem("");
    short len = 0;
    for(int i = 0; i < items.size(); ++i)
    {
        serializedItem = items[i].Serialize();
        len = serializedItem.length();

        // Write the bytes to the stream
        ss.write(*(char*)&(len), 2);
        ss.write(serializedItem.c_str(), len);

    }
    buffer = reinterpret_cast<unsigned char*>(
                const_cast<char*>(ss.str().c_str()));
    return items.size();
}

Is it safe to remove the const-ness from the ss.str().c_str() and then reinterpret_cast the result as unsigned char* then assign it to the buffer?

Note: the code is just to give you an idea of what I'm doing, it doesn't necessarily compile.

Kiril
  • 39,672
  • 31
  • 167
  • 226
  • 2
    Were you a Java developer before? C++ streams work really well if you define the operator << for the objects you want to serialize. Also it would probably be best to serialize to a stream rather than a char buffer. – Martin York Aug 11 '11 at 19:22
  • @Lirik : Why is `buffer` a raw pointer in the first place? If you used a proper class (e.g. `std::vector`) then this would be a complete non-issue. – ildjarn Aug 11 '11 at 19:24
  • @Martin, mostly developing in C#; in this case I have a C# application that has a C++/CLI DLL and I have to serialize some data and pass it back and forth between the C# app and the C++/CLI DLL (if that makes sense). Thus the use of a char buffer. – Kiril Aug 11 '11 at 19:26
  • @ildjarn C# doesn't know about vectors and I have to send the objects back to C# from the C++/CLI DLL. – Kiril Aug 11 '11 at 19:28
  • @Lirik : C# can't use `unsigned char*` in a sensible manner either. If you're using C++/CLI as your managed/unmanaged shim, why not expose a sensible .NET interface to C# and have the function take e.g. a `System::IO::Stream^`? – ildjarn Aug 11 '11 at 19:30
  • @Lirik: I would change your code to to std::ostream. Then serialize it using the normal stream operators. When you want to serialize it to a buffer just use a std::stringstream object. Once the data has been serialized into the stream you can pass the stream buffer (as a char*) to the C# part of the program for them to use. When the C# code passes you a buffer back create a std::stringstream object using the passed buffer then de-serialize the data back into an object. – Martin York Aug 11 '11 at 19:36
  • #ildjarn as far as I know, an `unsigned char*` in C++ is the equivalent of `byte*` in C#, [according to another answer](http://stackoverflow.com/questions/2333302/what-are-the-c-equivalent-of-these-c-structs), and the app that I'm working on has other parts with different logic that successfully "marshal" data back and forth using the `unsigned char*` to `byte*` cast. – Kiril Aug 11 '11 at 19:40
  • @Lirik : You're missing the larger picture -- the whole purpose of using C++/CLI is to obviate the need for hand-written marshaling in C#. If you're going to use hand-written marshaling, why bother with C++/CLI? Conversely, if you're going to use C++/CLI, why not write a proper .NET interface for your C++/CLI objects so no marshaling is needed from the C# side? – ildjarn Aug 11 '11 at 19:48
  • @ildjarn, you're 100% right! However, I would write a proper .NET interface, but my boss gave me a library which he wrote (it doesn't have the proper .NET interface) and I'm on a deadline to chug out a prototype based on his code. I cleaned up as much as I could, but I simply won't have the time to make a better interface. Once the prototype is up and running, then I will probably get a lot more time to do the larger scale improvements that ought to be there in the first place. – Kiril Aug 11 '11 at 20:03
  • @Lirik : Fair enough. :-] But that being the case, changing the argument from `unsigned char*` to `[Out] array^%` is extremely trivial, and makes the usage from C# much more straightforward. – ildjarn Aug 11 '11 at 20:07
  • @ildjarn the other thing is that his the C++/CLI DLL is a thin wrapper around a native C++ DLL... I would have to take the `array^` and convert it to an `unsigned char*` anyway in order to send it to the native C++ DLL (not in this particular example, but in others which do involve the sending of data from C# to C++). My back is very much against the ropes and my hands are tied, I'm just not skilled enough to be called Houdini :). Anyway, as always, thanks for the great input! Your answers are very helpful! – Kiril Aug 11 '11 at 20:11
  • @Lirik : Fortunately, `array^` to `unsigned char*` is only one line of code. I'll add a full answer. – ildjarn Aug 11 '11 at 20:14
  • @ildjarn, that would be helpful! It's the first time I program something that mixes C++ and C#, so I don't know all the in's and out's. Thanks! – Kiril Aug 11 '11 at 20:19

5 Answers5

3

No removing const-ness of an inherently contant string will result in Undefined Behavior.

const char* c_str ( ) const;
Get C string equivalent

Generates a null-terminated sequence of characters (c-string) with the same content as the string object and returns it as a pointer to an array of characters.
A terminating null character is automatically appended.
The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object.

Alok Save
  • 202,538
  • 53
  • 430
  • 533
1

Short answer: No

Long Answer: No. You really can't do that. The internal buffer of those object belong to the objects. Taking a reference to an internal structure is definitely a no-no and breaks encapsulation. Anyway those objects (with their internal buffer) will be destroyed at the end of the function and your buffer variable will point at uninitialized memory.

Use of const_cast<> is usually a sign that something in your design is wrong.
Use of reinterpret_cast<> usually means you are doing it wrong (or you are doing some very low level stuff).

You probably want to write something like this:

std::ostream& operator<<(std::ostream& stream, Data const& serializable)
{
    return stream << serializable.internalData;

    // Or if you want to write binary data to the file:

    stream.write(static_cast<char*>(&serializable.internalData), sizeof(serializable.internalData);
    return stream;

}
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • Having the buffer variable point at uninitialized memory is not a problem because the buffer variable itself is going away when the block is exited, and it is never used after being assigned anyway. – hmakholm left over Monica Aug 11 '11 at 19:28
  • @Henning: Unless he changes the function signature to a reference (which it looks like he wants to do). – Martin York Aug 11 '11 at 19:34
1

This is unsafe, partially because you're stripping off const, but more importantly because you're returning a pointer to an array that will be reclaimed when the function returns.

When you write

ss.str().c_str()

The return value of c_str() is only valid as long as the string object you invoked it on still exists. The signature of stringstream::str() is

string stringstream::str() const;

Which means that it returns a temporary string object. Consequently, as soon as the line

ss.str().c_str()

finishes executing, the temporary string object is reclaimed. This means that the outstanding pointer you received via c_str() is no longer valid, and any use of it leads to undefined behavior.

To fix this, if you really must return an unsigned char*, you'll need to manually copy the C-style string into its own buffer:

/* Get a copy of the string that won't be automatically destroyed at the end of a statement. */
string value = ss.str();

/* Extract the C-style string. */
const char* cStr = value.c_str();

/* Allocate a buffer and copy the contents of cStr into it. */
unsigned char* result = new unsigned char[value.length() + 1];
copy(cStr, cStr + value.length() + 1, result);

/* Hand back the result. */
return result;

Additionally, as @Als has pointed out, the stripping-off of const is a Bad Idea if you're planning on modifying the contents. If you aren't modifying the contents, it should be fine, but then you ought to be returning a const unsigned char* instead of an unsigned char*.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
1

Since it appears that your primary consumer of this function is a C# application, making the signature more C#-friendly is a good start. Here's what I'd do if I were really crunched for time and didn't have time to do things "The Right Way" ;-]

using System::Runtime::InteropServices::OutAttribute;

void SerializeToBuffer([Out] array<unsigned char>^% buffer)
{
    using System::Runtime::InteropServices::Marshal;

    vector<Serializable> const& items = GetSerializables();
    // or, if Serializable::Serialize() is non-const (which it shouldn't be)
    //vector<Serializable> items = GetSerializables();

    ostringstream ss(ios_base::binary);
    for (size_t i = 0u; i != items.size(); ++i)
    {
        string const& serializedItem = items[i].Serialize();
        unsigned short const len =
            static_cast<unsigned short>(serializedItem.size());

        ss.write(reinterpret_cast<char const*>(&len), sizeof(unsigned short));
        ss.write(serializedItem.data(), len);
    }

    string const& s = ss.str();
    buffer = gcnew array<unsigned char>(static_cast<int>(s.size()));
    Marshal::Copy(
        IntPtr(const_cast<char*>(s.data())),
        buffer,
        0,
        buffer->Length
    );
}

To C# code, this will have the signature:

void SerializeToBuffer(out byte[] buffer);
ildjarn
  • 62,044
  • 9
  • 127
  • 211
0

Here is the underlying problem:

buffer = ... ;
return items.size();

In the second-to last line you're assigning a new value to the local variable that used (up until that point) to hold the pointer your function was given as an argument. Then, immediately after, you return from the function, forgetting everything about the variable you just assigned to. That does not make sense!

What you probably want to do is to copy data from the memory pointed to by ss_str().c_str() to the memory pointed to by the pointer stored in buffer. Something like

memcpy(buffer, ss_str().s_str(), <an appropriate length here>)
hmakholm left over Monica
  • 23,074
  • 3
  • 51
  • 73