3

I was wondering if there is any way to tune (on a linux system), the MTU for a given socket. (To make IP layer fragmenting into chunks smaller that the actual device MTU).

When I say for a given socket, I don't mean programatically in the code of the application owning the socket but rather externally, for example via a sysfs entry.

If there is currently no way do that, do you have any ideas about where to hook/patch in linux kernel to implement such a possibility ?

Thanks.

EDIT: why the hell do I want to do that ?

I'm doing some Layer3-in-Layer4 (eg: tunneling IP and above through TCP tunnel) tunneling. Unlike VPN-like solutions, I'm not using a virtual interface to achieve that. I'm capturing packets using iptables, dropping them for their normal way and writing them to the tunnel socket.

Think about the case of a big file transfer, all packets are filled up to MTU size. When I tunnel them, I add some overhead, leading in every original packet to produce two tunneled packets, it's under-optimal.

Jocelyn delalande
  • 5,123
  • 3
  • 30
  • 34
  • tun/tap drivers would make that a whole lot easier to solve than "stealing" traffic like that ;) – Flexo Nov 12 '10 at 15:34

2 Answers2

3

If the socket is created such that DF set on outgoing packets you might have some luck in spoofing (injecting) an ICMP fragmentation needed message back at yourself until you end up with the desired MTU. Rather ugly, but depending on how desperate you are it might be appropriate.

You could for example generate these packets with iptables rules, so the matching and sending is simple and external to your application. It looks like the REJECT target for iptables doesn't have a reject-with of fragmentation needed though, it probably wouldn't be too tricky to add one.

The other approach, if it's only TCP packets you care about is you might have some luck with the socket option TCP_MAXSEG or the TCPMSS target if that's appropriate to your problem.

For UDP or raw you're free to send() packets as small as you fancy!

Update:

Based on the "why would I want to do that?" answer, it seems like fragmenting packets if DF isn't set or raising ICMP "fragmentation needed" and dropping would actually be the correct solution.

It's what a more "normal" router would do and provided firewalls don't eat the ICMP packet then it will behave sanely in all scenarios, whereas retrospectively changing things is a recipe for odd behaviour.

The iptables clamp mss is quite a good fix for TCP over this "VPN" though, especially as you're already making extensive use of iptables it seems.

Flexo
  • 87,323
  • 22
  • 191
  • 272
  • ICMP forging can be a way, but quite hackish I must say ;). For TCP MAXSET, seems an interesting way, any idea if it's tunable from sysfs/procfs ? – Jocelyn delalande Nov 12 '10 at 14:55
  • I can't see anything either in the manpage for proc(5) or from find on either /sys or /proc that gives anything other than read access to established connections. – Flexo Nov 12 '10 at 15:07
  • The other alternative of course is to interpose a few functions via a shared library. You could intercept calls to socket() and having setup a signal handler make some signal change this as setsockopt() is async safe. Can you be more specific about why you're trying to do this though? I think it's rather unusual (hence the lack of an obvious clean way to achieve it!) – Flexo Nov 12 '10 at 15:09
  • Another piece of question is : who is suposed to handle the ICMP messages "fragmentation needed", is it up to the userspace application dealing with the socket or is it handled at kernel level ? – Jocelyn delalande Nov 14 '10 at 13:04
  • For TCP the "fragmentation needed" message should be handled for you. For UDP/Raw sockets this will typically either be exposed to userspace as an error, or cause future packets this size to be fragmented for you, depending on the socket options set. – Flexo Nov 15 '10 at 15:39
1

MTU is a property of a link, not socket. They belong to different layers of the stack. That said TCP performs Path MTU discovery during the three-way handshake and tries very hard to avoid fragmentation. You'll have hard time making TCP send fragments. With UDP the easiest is to force some smallish MTU on an interface with ifconfig(8) and then send packets larger then that value.

Nikolai Fetissov
  • 82,306
  • 11
  • 110
  • 171
  • I realy want it to be per-socket, so ifconfig is not the way. Even for TCP, the MSS (maximum segment size) use the MTU value in its discovery. By the way, even if MTU is set per-interface, it's accessed by the IP (or/and TCP ?) layer at some time to do fragmentation, isn't it ? – Jocelyn delalande Nov 12 '10 at 14:22
  • Right, link layer gives upper layers the MTU information. If you want to see fragmentation, just send UDP packets larger then 1472 (ethernet 1500 frame size minus 20 bytes of IP header minus 8 bytes of UDP header). – Nikolai Fetissov Nov 12 '10 at 14:31
  • I assumed this question was asking about making sockets send packets much smaller than the MTU. – Flexo Nov 12 '10 at 14:44
  • @nikolai-n-fetissov: seeing fragmentation is not a goal by itself for me, what I would like is that MTU information reported to upper layer being set to an arbitrary (lower in my case) value for a given socket. – Jocelyn delalande Nov 12 '10 at 14:53