Don't understand why a code snippet from APUE unlinks a file attached to the client unix domain socket

Question

The book defines 3 customized functions:

int serv_listen(const char *name);
//Returns: file descriptor to listen on if OK, negative value on error

int serv_accept(int listenfd, uid_t *uidptr);
//Returns: new file descriptor if OK, negative value on error

int cli_conn(const char *name);
//Returns: file descriptor if OK, negative value on error

The serv_accept function (Figure 17.9) is used by a server to wait for a client’s connect request to arrive. When one arrives, the system automatically creates a new UNIX domain socket, connects it to the client’s socket, and returns the new socket to the server. Additionally, the effective user ID of the client is stored in the memory to which uidptr points.

serv_accept function code and description:

#include "apue.h"
#include <sys/socket.h>
#include <sys/un.h>
#include <time.h>
#include <errno.h>

#define STALE   30  /* client's name can't be older than this (sec) */

/*
 * Wait for a client connection to arrive, and accept it.
 * We also obtain the client's user ID from the pathname
 * that it must bind before calling us.
 * Returns new fd if all OK, <0 on error
 */
int
serv_accept(int listenfd, uid_t *uidptr)
{
    int                 clifd, err, rval;
    socklen_t           len;
    time_t              staletime;
    struct sockaddr_un  un;
    struct stat         statbuf;
    char                *name;

    /* allocate enough space for longest name plus terminating null */
    if ((name = malloc(sizeof(un.sun_path + 1))) == NULL)
        return(-1);
    len = sizeof(un);
    if ((clifd = accept(listenfd, (struct sockaddr *)&un, &len)) < 0) {
        free(name);
        return(-2);     /* often errno=EINTR, if signal caught */
    }

    /* obtain the client's uid from its calling address */
    len -= offsetof(struct sockaddr_un, sun_path); /* len of pathname */
    memcpy(name, un.sun_path, len);
    name[len] = 0;          /* null terminate */
    if (stat(name, &statbuf) < 0) {
        rval = -3;
        goto errout;
    }

#ifdef  S_ISSOCK    /* not defined for SVR4 */
    if (S_ISSOCK(statbuf.st_mode) == 0) {
        rval = -4;      /* not a socket */
        goto errout;
    }
#endif

    if ((statbuf.st_mode & (S_IRWXG | S_IRWXO)) ||
        (statbuf.st_mode & S_IRWXU) != S_IRWXU) {
          rval = -5;    /* is not rwx------ */
          goto errout;
    }

    staletime = time(NULL) - STALE;
    if (statbuf.st_atime < staletime ||
        statbuf.st_ctime < staletime ||
        statbuf.st_mtime < staletime) {
          rval = -6;    /* i-node is too old */
          goto errout;
    }

    if (uidptr != NULL)
        *uidptr = statbuf.st_uid;   /* return uid of caller */
    unlink(name);       /* we're done with pathname now */
    free(name);
    return(clifd);

errout:
    err = errno;
    close(clifd);
    free(name);
    errno = err;
    return(rval);
}

... Then we call stat to verify that the pathname is indeed a socket and that the permissions allow only user-read, user-write, and user-execute. We also verify that the three times associated with the socket are no older than 30 seconds.

If all these checks are OK, we assume that the identity of the client (its effective user ID) is the owner of the socket.

Why does the server code unlink(name) the file attached to the client's socket?

Other 2 functions code are provided via a link:

https://wandbox.org/permlink/jq5BajJYLgoh4yO6

rici · Answer 1 · 2019-12-24T06:03:19.000

Why does the server code unlink(name) the file attached to the client's socket?

It's more accurate to say that the server is deleting the filepath attached to the client's socket. Or more colloquially, the client's socket's name.

Recall that unlink() does not delete named objects which are currently open in some process; the client's socket is presumably still open in the client, so unlink(name) doesn't yet delete the socket. Rather, it ensures that the socket will be deleted when it is no longer being used by a running process.

What it does do immediately is free up the name, so that the name can be reused with a different socket.

So why do that? Mostly so that the filesystem doesn't fill up with zombie socket names. It doesn't help the current client reuse the name (for example to connect to a different service) because the client anyway unlinks the name before attempting to use it. But the zombie name could be a problem for a different future client process with a different uid which happens to be assigned the same pid. That future process might not have sufficient permissions to unlink the name, in which case it will end up not being able to use this IPC mechanism (at least with this library).

Ok, so why is it unlinked by the server? The server makes use of the filepath for the stat call, and the client has no way of knowing when that happens. Since it's basically a good idea to unlink the name as soon as possible, it's better in this case for the server to unlink the name; it knows when it no longer needs the name.

Of course, the code as presented is not perfect. There are execution paths which will result in some names not being unlinked (for example, if the server process crashes at a bad time). But these should be rare. Experience shows that clients crash much more often than servers.

Thank you rici. With your help I came up with my own answer. Check if I get it right :P. — Rick, Dec 24 '19 at 10:20
You are right about the "cats" and "dogs". But one more question, *"But the zombie name could be a problem for a different future client process with **a different uid which happens to be assigned the same pid**."*. Even if the server `unlink` the current `filepath` as soon as possible, as long as the current client doesn't exit, how could there be another future client will get the same `pid`? — Rick, Dec 25 '19 at 02:19
Maybe the code just wants to try its best to ensure that the `filepath` will be released when the client disconnects. Double protection from both server side and client side. Like you said, when clients crash, then only the server can then `unlink` the `filepath`. — Rick, Dec 25 '19 at 04:29

Rick · Answer 2 · 2019-12-24T10:17:30.010

I just realized that a unix socket domain can be either named or unnamed. A server unix domain socket needs to be named because the client needs a way to know where you are. The Internet sockets accomplish this with port numbers. See Is there a file for each socket? .

Then I may understand the behaviour of uname(name) in serv_accept.

First, check the client connect function cli_conn code:

#include "apue.h"
#include <sys/socket.h>
#include <sys/un.h>
#include <errno.h>

#define CLI_PATH    "/var/tmp/"
#define CLI_PERM    S_IRWXU         /* rwx for user only */

/*
 * Create a client endpoint and connect to a server.
 * Returns fd if all OK, <0 on error.
 */
int
cli_conn(const char *name)
{
    int                 fd, len, err, rval;
    struct sockaddr_un  un, sun;
    int                 do_unlink = 0;

    if (strlen(name) >= sizeof(un.sun_path)) {
        errno = ENAMETOOLONG;
        return(-1);
    }

    /* create a UNIX domain stream socket */
    if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0)
        return(-1);

    /* fill socket address structure with our address */
    memset(&un, 0, sizeof(un));
    un.sun_family = AF_UNIX;
    sprintf(un.sun_path, "%s%05ld", CLI_PATH, (long)getpid());
printf("file is %s\n", un.sun_path);
    len = offsetof(struct sockaddr_un, sun_path) + strlen(un.sun_path);

    unlink(un.sun_path);        /* in case it already exists */
    if (bind(fd, (struct sockaddr *)&un, len) < 0) {
        rval = -2;
        goto errout;
    }
    if (chmod(un.sun_path, CLI_PERM) < 0) {
        rval = -3;
        do_unlink = 1;
        goto errout;
    }

    /* fill socket address structure with server's address */
    memset(&sun, 0, sizeof(sun));
    sun.sun_family = AF_UNIX;
    strcpy(sun.sun_path, name);
    len = offsetof(struct sockaddr_un, sun_path) + strlen(name);
    if (connect(fd, (struct sockaddr *)&sun, len) < 0) {
        rval = -4;
        do_unlink = 1;
        goto errout;
    }
    return(fd);

errout:
    err = errno;
    close(fd);
    if (do_unlink)
        unlink(un.sun_path);
    errno = err;
    return(rval);
}

And the book writes:

We call socket to create the client’s end of a UNIX domain socket. We then fill in a sockaddr_un structure with a client-specific name. We don’t let the system choose a default address for us, because the server would be unable to distinguish one client from another (if we don’t explicitly bind a name to a UNIX domain socket, the kernel implicitly binds an address to it on our behalf and no file is created in the file system to represent the socket). Instead, we bind our own address — a step we usually don’t take when developing a client program that uses sockets.

So I think it's all about enabing the server to recogonize a group of similar clients.

For example:

attaching a filepath /tmp/cat with a client -> client connect to the server -> server check filepath and know that it is a "cat" client -> server release the occupation of that filepath -> another "cat" client can be created.

By this way, the server can distinguish for example a "cat" client from a "dog" client with different underlying filepath /tmp/cat and /tmp/dog.

Do I get it right?

If you look closely, the client creates the name with `sprintf(un.sun_path, "%s%05ld", CLI_PATH, (long)getpid());`, which will be something like `/var/tmp/02016`. The 02016 is the client's PID, which will be unique amongst all running processes. So there are no cats and dogs. The client doesn't even take the server's socket name into account when it creates this name. But, and this is important, the name cannot collide with any cloent running in a different process. Maybe my answer wasn't explicit enough on this point. — rici, Dec 24 '19 at 21:24

Don't understand why a code snippet from APUE unlinks a file attached to the client unix domain socket

2 Answers2