3

Problem: On raw sockets, recvfrom can capture more bytes than sendto can send, preventing me from retransmitting packets larger than MTU.

Background: I'm programming an application that will capture and retransmit packets. Basically host A sends data to X that logs them and forwards them to B, all Linux machines. I'm using raw socket so I can capture all data and it's created with socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)).

Then, there's code waiting for and reading incoming packets:

const int buffer_size = 2048;
uint8_t* buffer = new uint8_t[buffer_size];
sockaddr_ll addr = {0};
socklen_t addr_len = sizeof(addr);
int received_bytes = recvfrom(_raw_socket, buffer, buffer_size, 0, (struct sockaddr*)&addr, &addr_len);

Packet processing follows and the loop is finished with sending packet out again:

struct sockaddr_ll addr;
memset(&addr, 0, sizeof(struct sockaddr_ll));
addr.sll_family = htons(AF_PACKET);
addr.sll_protocol = eth_hdr->type;
addr.sll_ifindex = interface().id();
addr.sll_halen = HardwareAddress::byte_size;
memcpy(&(addr.sll_addr), eth_hdr->dest_mac, HardwareAddress::byte_size);

// Try to send packet
if(sendto(raw_socket(), data, length, 0, (struct sockaddr*)&addr, sizeof(addr)) < 0)

The problem is that I don't expect to receive packets that are larger than Ethernet MTU (1500 bytes) and I shouldn't since I'm using raw sockets that process each packet individually. But sometimes I do receive packets larger than MTU. I thought it might be error in my code but Wireshark confirms that as shown in the image, so there must be some reassembly going on at lower level like network controller itself. Received packet

Well, ok then I don't think there's a way to disable this for just one application and I can't change the host configuration, so I might increase buffer size. But the problem is that when I call sendto with anything larger than MTU size (actually 1514B, becuase of eth header) I get 80: Message too long errno. And that's the problem stated above - I can't send out the same packet I received. What could be possible solution for this? And what buffer size would I need to always capture whole packet?

EDIT: I have just checked on the machines with ethtool -k interf and got tcp-segmentation-offload: on on all of them, so it seems that it's really NIC reassembling fragments. But I wonder why sendto doesn't behave as recvfrom. If the packets can be automatically reassembled, why not fragmented?

A side note: The application needs to send those packets. Setting up forwarding with iptables etc. won't work.

Raven
  • 4,783
  • 8
  • 44
  • 75

1 Answers1

3

Your network card probably has segmentation offload enabled, which means the hardware can re-assemble TCP segments before they reach the OS or your code.

You can check whether that is the case by running ethtool -k. While transparently capturing TCP traffic and re-transmitting it at such a low level is often more trouble than it is worth(one are often better off doing this at the application layer, terminate the TCP connection and set up a new TCP connection towards your host B), you cannot capture and re-send packets if your network card has messed with the packets. You need to:

  • Turn off generic-segmentation-offload
  • Turn off generic-receive-offload
  • Turn off tcp-segmentation-offload
  • Turn off udp-fragmentation-offload if you are also dealing with UDP
  • Turn off rx-vlan-offload/tx-vlan-offload if your packets are VLAN encapsulated
  • Possibly turn off rx-checksumming and tx-checksumming. It either works if both are enabled, or it's broken wrt. RAW sockets if enabled, depending on your kernel version and type of network card.

These can be turned on/off with the ethtool -K command, the exact syntax is described in the ethtool manpage.

nos
  • 223,662
  • 58
  • 417
  • 506
  • I just wrote an edit parallel to your answer confirming the offloading. 1) There isn't other way than reconfiguring the host? 2) Why `recvfrom` "works" with the offloading but `sendto` doesn't? I'd expect that if something can be automatically reassemled, it can be also fragmented. – Raven Mar 27 '17 at 17:18
  • Ok. Well, I think offloading on the sending side(ifg at all supported by the NIC) is a fine interaction between the OS TCP/IP stack and the hardware, something you do not get the advantage of with using just raw sockets. – nos Mar 27 '17 at 17:21