18

When creating a unix socket, the path name (man 7 unix) is allowed to be maximally 108 chars long. For a friend this caused a bug in his program because his path was longer. Now we wonder how exactly that number was determined.

I have the suspicion that the number was determined so that sizeof of that struct sockaddr_un is unambiguous compared to the sizeof of other sockaddresses like sockaddr_in. But if they wanted to avoid clashes with other sizeof values, why not use a prime number for example? Can someone please provide an authorative source for that?

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • 2
    Why would a prime number avoid he size of other types? – Ed Heal Jan 16 '16 at 17:03
  • @EdHeal Perhaps I'm mistaken but adding up multiples-of-2 (caused by alignments) won't anymore make a struct match its sizeof. Alternatively I could have asked, why use an even number then if the goal would be to avoid clashes? Which seems to indicate that my suspicion about the number may be incorrect. – Johannes Schaub - litb Jan 16 '16 at 17:07
  • 1
    Granted lots of structures are multiple of twos (probably half of them). But making something the size of a prime number does not avoid clashes. Anyway what is the point of doing this if the programmer is getting this fundamental part wrong and using the wrong structure. – Ed Heal Jan 16 '16 at 17:11
  • 3
    According to Michael Kerrisk, SusV3 (Single Unix Specification) *doesn’t specify the size of the sun_path field. Early BSD implementations used 108 and 104 bytes, and one contemporary implementation (HP-UX 11) uses 92 bytes. Portable applications should code to this lower value, and use snprintf() or strncpy() to avoid buffer overruns when writing into this field.* so no magic about prime numbers it seems (and only even numbers)... just some arbitrary limit like the 255 for filenames in many file systems – Gerard Rozsavolgyi Jan 16 '16 at 17:22
  • All I can find are mentions that the size differs on different implementations (the Linux manpage, APUE, POSIX), never anything about the rationale for the size. – Michael Burr Jan 16 '16 at 18:11
  • please post the related code, including this overlength address. I would expect the code to be calling: `getaddrinfo()` and that function does not care about the length of the domain address. – user3629249 Jan 16 '16 at 21:35
  • I think this length may vary slightly between linuxes, and I also think there's a difference of 1 between RedHat 7 and Solaris 11. This is based on a bug in a Perl program which manifested itself at slightly different lengths of a path on those OS'es. So it's not absolutely fixed at 108. – Bjorn Munch Jan 17 '16 at 00:15
  • @user3629249 You would normally not use getaddrinfo() for a unix socket(AF_UNIX type sockets) – nos Jan 17 '16 at 16:31
  • @nos, the `getaddrinfo()` replaces both the `gethostbyname()` and `getservbyname()` functions and uses pointers to the 'path' names, so is not bothered by ipv4 ipv6 differences and doesn't care about the length of the 'path' string. – user3629249 Jan 17 '16 at 16:41
  • 1
    @user3629249 We are not talking about ipv4/ipv6 here, but unix sockets, a form of local interprocess communication on unix systems, unrelated to network communication - https://en.wikipedia.org/wiki/Unix_domain_socket – nos Jan 17 '16 at 16:52

2 Answers2

6

It was to match the space available in a handy kernel data structure.

EDIT:

Quoting "The Design and Implementation of the 4.4BSD Operating System" by McKusick et. al. (page 369):

The memory management facilities revolve around a data structure called an mbuf. Mbufs, or memory buffers, are 128 bytes long, with 100 or 108 bytes of this space reserved for data storage.

John Hascall
  • 9,176
  • 6
  • 48
  • 72
5

If you cannot find it sometimes it just means that there is nothing to find. But It can also mean that you couldn't find it. However, I would like to share what I found so far and

I make the hard guess that the number is arbritary.

My guess is supported by these two statements from the GNU C Library:

char sun_path[108]

This is the file name to use. Incomplete: Why is 108 a magic number? RMS suggests making this a zero-length array and tweaking the example following to use alloca to allocate an appropriate amount of storage based on the length of the filename.

(Where RMS should be Richard M. Stallman (another guess))

Date Type: struct sockaddr
...

char sa_data[14]

This is the actual socket address data, which is format-dependent. Its length also depends on the format, and may well be more than 14. The length 14 of sa_data is essentially arbitrary.

PS: Don't know why but this kind of questions makes me really curious.

terence hill
  • 3,354
  • 18
  • 31
  • It's interesting that RMS (yes, very likely Richard Stallman) suggested it being a zero-length array. Modern Windows kernels have structures that end with single-byte arrays that are expected to be allocated as (sizeof(thedata)+sizeof(thestruct)) bytes. – sjcaged Sep 13 '22 at 08:34