88

In scatter and gather (i.e. readv and writev), Linux reads into multiple buffers and writes from multiple buffers.

If say, I have a vector of 3 buffers, I can use readv, OR I can use a single buffer, which is of combined size of 3 buffers and do fread.

Hence, I am confused: For which cases should scatter/gather be used and when should a single large buffer be used?

ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
Jimm
  • 8,165
  • 16
  • 69
  • 118
  • another reason where you might wanna consider the scatter-gather i/o is when the file description table entries for a file are shared (leading to same seek positions). This can lead to race conditions if the two file descriptors are updated by different processes/threads etc. Duplicating file descriptors can be caused by process fork or dup* system calls. – sudeepdino008 Apr 23 '22 at 15:11

1 Answers1

126

The main convenience offered by readv, writev is:

  1. It allows working with non contiguous blocks of data. i.e. buffers need not be part of an array, but separately allocated.
  2. The I/O is 'atomic'. i.e. If you do a writev, all the elements in the vector will be written in one contiguous operation, and writes done by other processes will not occur in between them.

e.g. say, your data is naturally segmented, and comes from different sources:

struct foo *my_foo;
struct bar *my_bar;
struct baz *my_baz;

my_foo = get_my_foo();
my_bar = get_my_bar();
my_baz = get_my_baz();

Now, all three 'buffers' are not one big contiguous block. But you want to write them contiguously into a file, for whatever reason (say for example, they are fields in a file header for a file format).

If you use write you have to choose between:

  1. Copying them over into one block of memory using, say, memcpy (overhead), followed by a single write call. Then the write will be atomic.
  2. Making three separate calls to write (overhead). Also, write calls from other processes can intersperse between these writes (not atomic).

If you use writev instead, its all good:

  1. You make exactly one system call, and no memcpy to make a single buffer from the three.
  2. Also, the three buffers are written atomically, as one block write. i.e. if other processes also write, then these writes will not come in between the writes of the three vectors.

So you would do something like:

struct iovec iov[3];

iov[0].iov_base = my_foo;
iov[0].iov_len = sizeof (struct foo);
iov[1].iov_base = my_bar;
iov[1].iov_len = sizeof (struct bar);
iov[2].iov_base = my_baz;
iov[2].iov_len = sizeof (struct baz);

bytes_written = writev (fd, iov, 3);

Sources:

  1. http://pubs.opengroup.org/onlinepubs/009604499/functions/writev.html
  2. http://linux.die.net/man/2/readv
ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
  • 4
    In `Linux System Programming` book,they say `readv or writev can experience any of the errors of the read() and write() system calls, and will, upon receiving such errors, set the same errno codes.` So will readv return `EINTR`? Or what will happen to a signal that occurs inbetween the atomic read of readv? will it be ignored or queued . – nmxprime Apr 15 '14 at 04:04
  • @nmxprime if signal arrived during `readv()` or `writev()`, these syscalls (depending on SA_RESTART) will return less bytes than requested. – socketpair May 20 '16 at 20:34
  • Can I write the non-contiguous buffer in the file in a non-contiguous fashion and not as a whole block ? – Shubham Pendharkar Aug 22 '16 at 05:42
  • @Shubham What is a "non-contiguous buffer"? Or did you mean "non-contiguous *buffers*"? And what do you mean by writing them in a file in a "non-contiguous fashion"? To be clear: the dictionary definition of "non-contiguous" is: "*in close proximity without actually touching; near.*". – ArjunShankar Aug 22 '16 at 10:02
  • Whats the difference between using writev vs send with the "MSG_MORE" flag? Is it more beneficial to use writev are less system calls + the atomic write? – bgura Aug 28 '17 at 16:04
  • @ArjunShankar wrtiev() internally invokes write() itself.In such case how writev() can be more sufficient than invoking write() over individual blocks of data one after one ? – bvb May 30 '18 at 18:40
  • @bvb: it's not clear to me what you're asking, by if you're asking why writev is more efficient than write: it's because writev only requires one single context switch to kernel mode. And because each switch takes about a microsecond, a writev with a vector size of 1,000 might only cost 1 microsecond where 1,000 writes would take 1 millisecond. – Jay Sullivan Jan 02 '20 at 18:27