2

In xnu we have the vnode_t entity which represent the file globally.

Each process can access the file (assuming it has right permissions) by setting new file descriptor and set the vnode under fg_data

fp->f_fglob->fg_data = vp;

the vnode contain a list of basic actions for all relevant operations and is set in according to the file's FS. i.e. HFS+ driver implement such vector and set its vnode accordingly.

int     (**v_op)(void *);       /* vnode operations vector */

this is a vector for function pointers for all actions that may operate on the vnode.

In addition, we have the fileops struct that is part of the file descriptor (fg_global) which describe a minimal subset of these functions:

Here is a typical definition :

const struct fileops vnops = {
 .fo_type = DTYPE_VNODE,
 .fo_read = vn_read,
 .fo_write = vn_write,
 .fo_ioctl = vn_ioctl,
 .fo_select = vn_select,
 .fo_close = vn_closefile,
 .fo_kqfilter = vn_kqfilt_add,
 .fo_drain = NULL,
};

and we set it here :

fp->f_fglob->fg_ops = &vnops;

I saw that when reading regular file under local filesystem (HFS+), it works through the file_descriptor and not the vnode ...

 * frame #0: 0xffffff801313c67c kernel`vn_read(fp=0xffffff801f004d98, uio=0xffffff807240be70, flags=0, ctx=0xffffff807240bf10) at vfs_vnops.c:978 [opt]
frame #1: 0xffffff801339cc1a kernel`dofileread [inlined] fo_read(fp=0xffffff801f004d98, uio=0xffffff807240be70, flags=0, ctx=0xffffff807240bf10) at kern_descrip.c:5832 [opt]
frame #2: 0xffffff801339cbff kernel`dofileread(ctx=0xffffff807240bf10, fp=0xffffff801f004d98, bufp=140222138463456, nbyte=282, offset=<unavailable>, flags=<unavailable>, retval=<unavailable>) at sys_generic.c:365 [opt]
frame #3: 0xffffff801339c983 kernel`read_nocancel(p=0xffffff801a597658, uap=0xffffff801a553cc0, retval=<unavailable>) at sys_generic.c:215 [opt]
frame #4: 0xffffff8013425695 kernel`unix_syscall64(state=<unavailable>) at systemcalls.c:376 [opt]
frame #5: 0xffffff8012e9dd46 kernel`hndl_unix_scall64 + 22

My question is why does this duality needed, and in which cases the operation works through the file_descriptor vector (fg_ops) and which cases the operation works through the vnode vector (vp->v_op).

thanks

pmdj
  • 22,018
  • 3
  • 52
  • 103

1 Answers1

2

[…] in which cases the operation works through the file_descriptor vector (fg_ops) and which cases the operation works through the vnode vector (vp->v_op).

I'm going to start by answering this second part of the question first: if you trace through your call stack further, and look inside the vn_read function, you'll find that it contains this line:

    error = VNOP_READ(vp, uio, ioflag, ctx);

The VNOP_READ function (kpi_vfs.c) in turn has this:

_err = (*vp->v_op[vnop_read_desc.vdesc_offset])(&a);

So the answer to your question is that for your typical file, both tables are used for dispatching operations.

With that out of the way,

My question is why does this duality needed […]

Not everything to which a process can hold a file descriptor is also represented in the file system. For example, pipes don't necessarily have to be named. A vnode doesn't make any sense in that context. So in sys_pipe.c, you'll see a different fileops table:

static const struct fileops pipeops = {
    .fo_type = DTYPE_PIPE,
    .fo_read = pipe_read,
    .fo_write = pipe_write,
    .fo_ioctl = pipe_ioctl,
    .fo_select = pipe_select,
    .fo_close = pipe_close,
    .fo_kqfilter = pipe_kqfilter,
    .fo_drain = pipe_drain,
};

Similar deal for sockets.

File descriptors track the state of a process's view of a file or object that allows file-like operations. Things like position in the file, etc. - different processes can have the same file open, and they must each have their own read/write position - so vnode:fileglob is a 1:many relationship.

Meanwhile, using vnode objects to track things other than objects within a file system doesn't make any sense either. Additionally, the v_op table is file system specific, whereas vn_read/VNOP_READ contain code that applies to any file that's represented in a file system.

So in summary they're really just different layers in the I/O stack.

pmdj
  • 22,018
  • 3
  • 52
  • 103
  • Hi and thanks. just one more thing. following your response, I did some more research on that matter and found out that VNOP_READ calls `hfs_vop_read` which is part of HFS+ driver, which itself calls the cluster method of `cluster_read` which actually does all the low-level read (from cache or from disk) .. so far so good, however, I couldn't find the synonym in `mmap` which calls VNOP_MMAP that calls `hfs_vnop_mmap`, but in this function, which return ENOTSUP "because we want the cluster layer to actually do all the real work." (quoted from their comment)... –  Jun 09 '17 at 12:13
  • this leaves me with the question of who's in charge of reading the file in mmap case. of course it's the cluster layer, but who calls `cluster_read` or something similar from mmap call trace ? –  Jun 09 '17 at 12:15