9

I'm writing a websocket server and I have to deal with masked data that I need to unmask.

The mask is unsigned char[4], and the data is a unsigned char* buffer as well.

I don't want to XOR byte by byte, I'd much rather XOR 4-bytes at a time.

uint32_t * const end = reinterpret_cast<uint32_t *>(data_+length);
for(uint32_t *i = reinterpret_cast<uint32_t *>(data_); i != end; ++i) {
    *i ^= mask_;
}

Is there anything wrong with the use of reinterpret_cast in this situation?

The alternative would be the following code which isn't as clear and not as fast:

uint64_t j = 0;
uint8_t *end = data_+length;
for(uint8_t *i = data_; i != end; ++i,++j) {
    *i ^= mask_[j % 4];
}

I'm all ears for alternatives, including ones dependent on c++11 features.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
matthewaveryusa
  • 632
  • 5
  • 24
  • 1
    "Is reinterpret_cast bad when..." - "Yes, never use it!" - every C++ guy, ever. –  Dec 30 '12 at 23:20
  • ok but in this case what's the alternative? – matthewaveryusa Dec 30 '12 at 23:21
  • 3
    Well, if your code involving `reinterpret_cast` were even remotely close to correct, I personally wouldn't have a problem with it. ;-) – Omnifarious Dec 30 '12 at 23:22
  • @mna I'm not saying it's bad, they do. I tend to do nasty typecasts in C. –  Dec 30 '12 at 23:22
  • 6
    As an immediate reaction I'd wonder if `length` is guaranteed to be a multiple of `sizeof(uint32_t)`. Otherwise there is an obvious flaw in the code. – Dietmar Kühl Dec 30 '12 at 23:22
  • 1
    @H2CO3 sometimes you cannot get around `reinterpret_cast`. It's the C-style casts that are bad in (almost) every case. –  Dec 30 '12 at 23:24
  • @Zoidberg'-- Yes, there are cases when things considered dangerous are just necessary. –  Dec 30 '12 at 23:25
  • @Dietmar Kühl No it isn't, I simplified. I really want to squeeze in 4 bytes at a time. length is length_ & 0xFFFFFFFC – matthewaveryusa Dec 30 '12 at 23:25
  • What is `data`? The answer depends on that... – K-ballo Dec 30 '12 at 23:30
  • 3
    The other potential is that `data` might not be properly aligned to deal with `uint32_t` at that address. It isn't an issue on many platforms but there are platforms where data needs to be aligned properly to be accessible. If the data is properly aligned I'd wonder why you wouldn't use a bigger types, though. – Dietmar Kühl Dec 30 '12 at 23:31
  • data is just a bunch of bytes so unsigned char* / uint8_t* – matthewaveryusa Dec 30 '12 at 23:34
  • @Dietmar Kühl data_ isn't guaranteed to be divisible by uint32_t, later in the code I go byte-by-byte for the last 1,2 or 3 bytes of data_ – matthewaveryusa Dec 30 '12 at 23:36
  • @mna: now take everything you've said in these comments here, and edit that information into your question. Comments can be and are delted without warning, the question needs to stand on it's own. – Mooing Duck Mar 14 '13 at 23:22

2 Answers2

8

The are a few potential problems with the approach posted:

  1. On some systems objects of a type bigger than char needs to be aligned properly to be accessible. A typical requirement for uint32_t is that the object is aligned to an address divisible by four.
  2. If length / sizeof(uint32_t) != 0 the loop may never terminate.
  3. Depending on the endianess of the system mask needs to contain different values. If mask is produced by *reinterpret_cast<uint32_t>(char_mask) of a suitable array this shouldn't be an array.

If these issues are taken care of, reinterpret_cast<...>(...) can be used in the situation you have. Reinterpreting the meaning of pointers is one of the reasons this operation is there and sometimes it is needed. I would create a suitable test case to verify that it works properly, though, to avoid having to hunt down problems when porting the code to a different platform.

Personally I would go with a different approach until profiling shows that it is too slow:

char* it(data);
if (4 < length) {
    for (char* end(data + length - 4); it < end; it += 4) {
        it[0] ^= mask_[0];
        it[1] ^= mask_[1];
        it[2] ^= mask_[2];
        it[3] ^= mask_[3];
    }
}
it != data + length && *it++ ^= mask_[0];
it != data + length && *it++ ^= mask_[1];
it != data + length && *it++ ^= mask_[2];
it != data + length && *it++ ^= mask_[3];

I'm definitely using a number of similar approaches in software which meant to be really faster and haven't found them to be a notable performance problem.

David G
  • 94,763
  • 41
  • 167
  • 253
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
2

There's nothing specifically wrong with reinterpret_cast in this case. But, take care.

The 32-bit loop as it stands is incorrect, because it doesn't cater for the case where the payload isn't a multiple of 32 bits in size. Two possible solutions, I suppose:

  • replace the != with < in the for loop check (there's a reason why people use <, and it's not because they're dumb...) and do the trailing 1-3 bytes bytewise
  • arrange the buffer so that the size of the buffer for the payload part is a multiple of 32 bits, and just XOR the extra bytes. (Presumably the code checks the payload length when returning bytes to the caller, so this doesn't matter.)

Additionally, depending on how the code is structured you might also have to cope with misaligned data accesses for some CPUs. If you have an entire frame buffered, header and all, in a buffer that's 32-bit aligned, and if the payload length is <126 bytes or >65,535 bytes, then both the masking key and the payload will be misaligned.

For whatever it's worth, my server uses something like the first loop:

for(int i=0;i<n;++i)
    payload[i]^=key[i&3];

Unlike the 32-bit option, this is basically impossible to get wrong.

Tom Seddon
  • 2,648
  • 1
  • 19
  • 28