4

I have a generic question about Linux kernel's handling of file I/O. So far my understanding is that, in an ideal case, after process A reads a file, data is loaded into page cache, and if process B reads the same page before it is reclaimed, it does not need to hit the disk again.

My question is related to how the block device I/O works. Process A's read request will eventually be queued before the I/O actually happens. Now if device B's request (a bio struct) is to be inserted into the request_queue, before A's request is executed, elevator will consider whether to merge B's bio into any existing request. Now, if A and B try to read the same file offset, i.e. same device block, they are literally the same I/O, (or A and B's requests are not exactly the same but they overlap for some blocks), but so far I have not seen this case being considered in kernel code. (The only relevant thing I saw is a test on whether bio can be glued to an existing request contiguously.)

kernel 2.6.11

inline int elv_try_merge(struct request *__rq, struct bio *bio)
{
    int ret = ELEVATOR_NO_MERGE;

    /*
     * we can merge and sequence is ok, check if it's possible
     */
    if (elv_rq_merge_ok(__rq, bio)) {
        if (__rq->sector + __rq->nr_sectors == bio->bi_sector)
            ret = ELEVATOR_BACK_MERGE;
        else if (__rq->sector - bio_sectors(bio) == bio->bi_sector)
            ret = ELEVATOR_FRONT_MERGE;
    }

    return ret;
}

kernel 5.3.5

enum elv_merge elv_merge(struct request_queue *q, struct request **req,
        struct bio *bio)
{
    struct elevator_queue *e = q->elevator;
    struct request *__rq;
    ...
    /*
     * See if our hash lookup can find a potential backmerge.
     */
    __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector);
    ...
}

struct request *elv_rqhash_find(struct request_queue *q, sector_t offset)
{
    struct elevator_queue *e = q->elevator;
    struct hlist_node *next;
    struct request *rq;

    hash_for_each_possible_safe(e->hash, rq, next, hash, offset) {
        ...
        if (rq_hash_key(rq) == offset)
            return rq;
    }

    return NULL;
}

#define rq_hash_key(rq)     (blk_rq_pos(rq) + blk_rq_sectors(rq))

Does that mean kernel will just do two I/Os? Or (very likely) I missed something?

thanks!

QnA
  • 1,035
  • 10
  • 25
  • You're looking at pretty old code - what exact release? – Michael Foukarakis Oct 09 '19 at 09:49
  • thanks, @michael. Original code was taken from kernel 2.6.11, I added the latest kernel 5.3.5 snippet. The logic look the same, unless I missed something in the call chain, which is quite possible. – QnA Oct 09 '19 at 16:22
  • I'd assume you're looking in the wrong place entirely. E.g. process requests to read data from file; VFS/page cache determines if that data is cached, being fetched or needs fetching (and requests the data from file system if it needs fetching); file system does some stuff and asks storage device for blocks; storage device uses elevator algorithm (without caring about "merge-able" requests that were already avoided several layers higher up). – Brendan Oct 09 '19 at 17:02
  • thanks, @Brendan. I had the same thought, but wasn't able to find in which 'upper' layer this overlapping requests are merged/handled. Chasing down `def_blk_fops.read_iter()`, I only found logic that skips a page I/O when `PG_uptodate` is set, but that bit is only set *after* I/O request is done, and it does not scan the `request_queue` for existing overlapping request waiting in line. Therefore I started looking for the answer in `request_queue` related code. – QnA Oct 09 '19 at 17:32
  • 1
    Try to follow the code in `generic_file_buffered_read()` in "mm/filemap.c". Pages are only read from the block device when the page has been exclusively locked by the current task. Other tasks reading the same page will end up waiting on the page's wait queue (if blocking), or return `-EAGAIN` (if non-blocking). The readahead stuff complicates things a bit (OK, a lot...). – Ian Abbott Oct 09 '19 at 18:23
  • thanks, @IanAbbott. `lock_page()` does solve the problem, if readahead does not complicate the picture. But I can settle for this:) – QnA Oct 10 '19 at 14:34

0 Answers0