0

I have an application which sends received UDP packets over TCP connection. I'm storing the UDP packet data in a std::string object.

For TCP send/receive, I'm using a data encoding/decoding scheme as <2-byte data length><Data>.

This is my requirement.

How do I prepend the 2-bytes string length to the std::string efficiently?

Also, do I need to take care of endianess also (hostToNetwork) for the 2-byte length integer? Also, for the data part?

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
RishiN
  • 39
  • 5
  • Store the data at a two-byte offset, fill in the length once you know what it is. – molbdnilo Jun 15 '21 at 11:11
  • cannot. The udp data is passed to one parser engine and it return the string object which i need to prepand it with its length and send it. – RishiN Jun 15 '21 at 11:19
  • You can't prepend efficiently (i.e. without moving the other data). On the other hand, it's just moving a UDP packet's worth of data, which is pretty fast. Is this operation really a bottleneck? (BTW, it's not obvious why you're not just adding this in the parser.) – molbdnilo Jun 15 '21 at 11:26
  • You probably want to take care of endianness of the size. If you're just receiving and resending UDP packets, you shouldn't modify the packet data in any way. – molbdnilo Jun 15 '21 at 11:28
  • ..or just call send( ) twice, once for the header, then another for the UDP datagram. – Martin James Jun 22 '21 at 03:15

3 Answers3

2

In UDP, one send() is a complete "message", you can't split a "message" across multiple send()s.

But TCP is a byte stream, so you can make multiple consecutive calls to send() per "message". So, just send the data length in one send(), and then send the data in another send(). TCP will ensure the bytes are received in the same order they are set sent. You don't have to prepend the length bytes to the std::string itself at all.

This works especially well if "Send Coalescing" (aka the Nagle Algorithm) is enabled, which is usually is by default. That allows the socket stack to buffer outgoing data so it can send packets over the network more efficiently. But even with Nagle disabled, this scheme will still work.

In fact, in TCP there is no guarantee that send() will accept all of the requested bytes in one go, so you have to be prepared to call send() multiple times anyway.

Try something like this:

bool sendRaw(int sock, const void *data, size_t len)
{
    const char *pdata = static_cast<const char*>(data);
    while (len > 0)
    {
        int numSent = send(sock, pdata, len, 0);
        if (numSent < 0) return false; // or throw...
        pdata += numSent;
        len -= numSent;
    }
    return true;
}

bool sendUint16(int sock, uint16_t value)
{
    value = htons(value);
    return sendRaw(sock, &value, sizeof(value));
}

bool sendString(int sock, const std::string &s)
{
    if (s.size() > 0xFFFF) return false; // or throw...
    uint16_t len = static_cast<uint16_t>(s.size());
    bool ok = sendUint16(sock, len);
    if (ok) ok = sendRaw(sock, s.c_str(), len);
    return ok;
}
std::string udpData = ...;
bool ok = sendString(sock, udpData);
...

And then you can just reverse the process on the receiving side, eg:

int recvRaw(int sock, void *data, size_t len)
{
    char *pdata = static_cast<char*>(data);
    while (len > 0)
    {
        int numRecvd = recv(sock, pdata, len, 0);
        if (numRecvd <= 0) return numRecvd; // or throw...
        pdata += numRecvd;
        len -= numRecvd;
    }
    return 1;
}

int recvUint16(int sock, uint16_t &value)
{
    int ret = recvRaw(sock, &value, sizeof(value));
    value = (ret == 1) ? ntohs(value) : 0;
    return ret;
}

int recvString(int sock, std::string &s)
{
    s.clear();
    uint16_t len;
    int ret = recvUint16(sock, len);
    if ((ret == 1) && (len > 0)) {
        s.resize(len);
        ret = recvRaw(sock, s.data()/*&s[0]*/, len);
    }
    return ret;
}
std::string udpData;
int ret = recvString(sock, udpData);
...
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thanks, but would it not impact performance of the application like for 1000 udp packets im running 2000 send() calls though with the prepend length scheme the send() call would be same as the number of packets. – RishiN Jun 16 '21 at 02:57
  • @tingtong no, it will not really impact performance much if at all. Trying to actually prepend the length bytes to the `std::string` would impact performance more due to allocating new memory. This code is using the memory that is already allocated. And the socket has an internal buffer anyway, so it doesn't matter how many `send()` calls you make, the bytes have to be copied into that buffer first before the kernel will then transmit it. – Remy Lebeau Jun 16 '21 at 03:08
  • I am little weak on how data is read/write in memory. Is this valid logic::: `std::string sourceUdp = "Vivekananda";` `uint16_t sLen = static_cast(sourceUdp.size());` `std::string destTcp="";` `destTcp.append(std::to_string((0xFFFF)));` `destTcp.append(sourceUdp); ` `destTcp.insert(0,std::to_string(sLen));` – RishiN Jun 16 '21 at 03:49
  • @tingtong no, because your use of `to_string()` and `insert()` are not appropriate in this case, thus `destTcp` will not have anywhere near the correct format you want. Try this instead: `std::string destTcp(2+sLen, '\0'); *reinterpret_cast(&destTcp[0]) = htons(sLen); memcpy(&destTcp[2], sourceUdp.c_str(), sLen);` – Remy Lebeau Jun 16 '21 at 04:13
0

Probably not the end of the world to just do this right before sending. Allocating a temp buffer off the stack for a UDP packet (which won't be bigger than 64K anyway) - assuming synchronous socket sends will be fast.

unsigned char buffer[LONGEST_STRING_LENGTH+2];
size_t len = str.size();
// assert(len <= LONGEST_STRING_LENGTH);
uint16_t lenNBO = (uint16_t)len;
lenNBO = htons(lenNBO);
memcpy(buffer, &lenNBO, 2);
memcpy(buffer+2, str.c_str(), len);
send(sock, buffer, len+2, 0);
selbie
  • 100,020
  • 15
  • 103
  • 173
0

Are you using any kind of (C++) wrapper around sockets? Otherwise, just pre-fill the string you receive data to with two arbitrary bytes, then append the data you receive from the socket. After that, write the length of the string into the first two offsets (minus two).

Something along the lines of

string x("xx");
// receive your data into x, starting at offset 2
x[0] = (x.length() - 2) & 0xff;
x[1] = ((x.length() - 2) & 0xff00) >> 8;
// send ...

Depending on which endianness you want, switch the 0 and 1 assignment. The code above writes it as little endian. Network byte order is considered to be big endian.

But tbh., if you're using the socket API directly, I'd use a plain char array instead of std::string.

Simon
  • 178
  • 1
  • 10