0

Consider the following snippet which gets some binary data and writes it to an ostringstream object:

unsigned char* payload;
unsigned long  size;

GetData(&payload, &size);

std::cout << md5(payload, size) << std::endl;

std::ostringstream stream;
stream.write((const char*)payload, size);

std::cout << md5(payload, size) << std::endl;

The problem is that, two printed hash values are different form each other, which means payload has been changed. I tried opening stream in binary mode by using std::ostringstream stream(std::ios::out | std::ios::binary), it did not make a difference, I didn't expect that it would, anyway.

Another fact is, I get a different checksum from the second print statement every time I re-run the program. First hash is always the same.

Now, how can I write my binary data correctly to ostringstream? Can the problem be the cast to const char* (GetData method takes an unsigned char** as the first parameter)?

UPDATE: In the light of comments, here are some more explanations:

  • Comparing the binary diff of the original data and the data written, I saw written data has shifted to the right (24 bytes) in some places. It has also some added bytes in the very beggining. I'm still thinking it has something to do with the cast.
  • There is no more code between GetData and the actual writing.
  • GetData works correctly, since the checksum after calling it is correct (I know what the checksum should be).
  • I cannot post compilable code, because of GetData. And it is not necessary, I have isolated the problem to the line where write is called.
  • System details are: gcc version 4.6.3 on Ubuntu 12.04 64bit
incrediblehulk
  • 409
  • 2
  • 11
  • The interesting code here would be `md5` and its `operator<<`. – James Kanze May 21 '13 at 08:51
  • md5 is from haslib++. It is given here just to prove payload is modified, that's how I found out `write` method is the problem. – incrediblehulk May 21 '13 at 08:55
  • 1
    @incrediblehulk Hashing algorithms are usually implemented as incremental - when called a second time, they append their argument to their internal buffer from previous calls, unless some sort of restart() is requested in between. Could this be happening here? – Angew is no longer proud of SO May 21 '13 at 08:59
  • No, I suspected the same and checked it by running multiple md5s before writing. It does not modify the payload for sure. – incrediblehulk May 21 '13 at 09:03
  • So what is the return type of `md5`, and how is the `<<` for it implemented? – James Kanze May 21 '13 at 09:16
  • I looked at the source as well, I can 100% assure you md5 is irrelevant to this problem. It copies the payload to an internal buffer first. And I did other integrity checks. I repeat, md5 is given here just for demonstration purposes. The problem is at the line where `write` is called. Believe me :) – incrediblehulk May 21 '13 at 09:25
  • If you've ruled out `md5`, might the problem lie with `GetData`? – GuyRT May 21 '13 at 09:54
  • Is there any other code between the two output statements that you haven't posted? I'm having a hard time believing that any implementation of ostringstream::write that modified the data it's supposed to be writing would last long out in the wild. – GuyRT May 21 '13 at 10:00
  • The only explanation for this is that your code contains undefined behaviour somewhere or that your compiler or standard library is buggy. Please post a *complete*, *compilable* example that demonstrates your problem (so we can rule out the first possibility), and name your compiler and standard library (so we can investigate the second possibility). – Mankarse May 21 '13 at 10:13
  • I made some additions, check the question. First bullet in the update might be interesting. – incrediblehulk May 21 '13 at 11:22

1 Answers1

0

The mystery of the problem turns out to be the size of the data.

After experimenting with different size values, it was discovered that ostringstream's internal buffer is around 65KB, 65504 bytes to be exact. When the size is bigger, strange shifts and crippled bytes occur.

The workaround is to use:

stream.rdbuf()->pubsetbuf((const char*)payload, payloadSize)

instead of write method. But when this scope terminates, payload will be invalidated and stream cannot be used anywhere else anymore. In my case, it needed to be used somewhere else.

This showed that:

  • I was indeed right that the issue is with ostringstream but not with the hash or anything else.
  • String streams of STL apparently have a default buffer size limit. This is to be remembered in the future.
incrediblehulk
  • 409
  • 2
  • 11