5

how to avoid page cache in kernel,the application can directly write or read data from disk?In kernel,how to set?

user6481589
  • 71
  • 2
  • 6
  • It's unclear what you're asking. Do you want to disable page caching just for one particular application in user-space, or for all user-space processes? Do you have to do this in kernel itself, or it doesn't matter how it will be done? Provide more details, please. – Sam Protsenko Jun 18 '16 at 13:19
  • I just want to disable page caching for GlusterFS.By Fuse,I mounted the GlusterFS client(nodeA) to /mnt/glusterfs via direct-io-mode(FUSE supported).when I opened a file in GlusterFS,the file was not cached in GlusterFS client(nodeA) because of the direct-io-mode supported by FUSE,but the file was cached in GlusterFS server(nodeB).In other words,in the server(nodeB),i dont want to cache the file.So the question is,how to disable page caching for GlusterFS server(nodeB)? – user6481589 Jun 19 '16 at 01:01
  • Open a file,the request was sent like this,open(User-Space,nodeA)--->sys_open(Kernel,nodeA)----->fuse_open(kernel,nodeA)------->client_open(User-space,nodeA)------------>server_open(User-Space,nodeB)------->sys_open(Kernel,nodeB)------->ext4_open(Kernel,nodeB). – user6481589 Jun 19 '16 at 01:05

2 Answers2

5

You will need the application to call O_DIRECT. From the man page http://man7.org/linux/man-pages/man2/open.2.html

With this you are telling the kernel to not write/read from page cache while doing I/O.

O_DIRECT (since Linux 2.4.10) Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user- space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT. See NOTES below for further discussion.

          A semantically similar (but deprecated) interface for block
          devices is described in raw(8).
  • Open a file,the request was sent like this,open(User-Space,nodeA)--->sys_open(Kernel,nodeA)----->fuse_open(kernel,node‌​A)------->client_open(User-space,nodeA)-----network------->server_open(User-Space,nodeB)‌​------->sys_open(Kernel,nodeB)------->ext4_open(Kernel,nodeB).Before network,they belong to GlusterFS client,and after,they are GlusterFS server.If I add O_DIRECT flag in client_open(),then the server will open the file in direct-io-mode? – user6481589 Jun 19 '16 at 01:07
  • Hmm so IIUC you want to open a file across a gluster FUSE mounted network file system and want to open the server to open the file without using server's page cache, but you want a way for the client to be able to specify this to the server. If this understanding is correct I would say you need to read the gluster fs protocol spec to see if it passes this kind of attribute over to the server or not. – Chaitanya Lala Jun 19 '16 at 03:27
0

UPDATE

The write caches in spec are not related with page cache. The cache here actually refer to RAM/NVRAM integrated into disk controllers, such memory should not be confused with page cache!




AFAIK, these only guarantees write page enable/disable switch for SATA and NVMe device,

SATA

refer to sata 3.0 spece:

SET FEATURES (Write Cache Enable/Disable): The write cache enable/disable setting established by the SET FEATURES command with subcommand code of 02h or 82h.

Under linux kernel, HDIO_SET_WCACHE ioctl can control it:

static DEFINE_MUTEX(ide_disk_ioctl_mutex);
static const struct ide_ioctl_devset ide_disk_ioctl_settings[] = {
{ HDIO_GET_ADDRESS, HDIO_SET_ADDRESS,   &ide_devset_address   },
{ HDIO_GET_MULTCOUNT,   HDIO_SET_MULTCOUNT, &ide_devset_multcount },
{ HDIO_GET_NOWERR,  HDIO_SET_NOWERR,    &ide_devset_nowerr    },
{ HDIO_GET_WCACHE,  HDIO_SET_WCACHE,    &ide_devset_wcache    },
{ HDIO_GET_ACOUSTIC,    HDIO_SET_ACOUSTIC,  &ide_devset_acoustic  },
{ 0 }
};

int ide_disk_ioctl(ide_drive_t *drive, struct block_device *bdev, fmode_t mode,
           unsigned int cmd, unsigned long arg)
{
    int err;

    mutex_lock(&ide_disk_ioctl_mutex);
    err = ide_setting_ioctl(drive, bdev, cmd, arg, ide_disk_ioctl_settings);
    if (err != -EOPNOTSUPP)
        goto out;

    err = generic_ide_ioctl(drive, bdev, cmd, arg);
out:
    mutex_unlock(&ide_disk_ioctl_mutex);
    return err;
}

And you can also use hdparm -W0/1 /dev/sdx to disable/enable write cache conviently, which also invoke HDIO_SET_WCACHE internally:

}
        if (!wcache)
            err = flush_wcache(fd);
        if (ioctl(fd, HDIO_SET_WCACHE, wcache)) {
            __u8 setcache[4] = {ATA_OP_SETFEATURES,0,0,0};
            setcache[2] = wcache ? 0x02 : 0x82;
            if (do_drive_cmd(fd, setcache, 0)) {
                err = errno;
                perror(" HDIO_DRIVE_CMD(setcache) failed");
            }
        }

NVME

kernel source:

static ssize_t queue_wc_show(struct request_queue *q, char *page)
{
    if (test_bit(QUEUE_FLAG_WC, &q->queue_flags))
        return sprintf(page, "write back\n");

    return sprintf(page, "write through\n");
}

static ssize_t queue_wc_store(struct request_queue *q, const char *page,
                  size_t count)
{
    int set = -1;

    if (!strncmp(page, "write back", 10))
        set = 1;
    else if (!strncmp(page, "write through", 13) ||
         !strncmp(page, "none", 4))
        set = 0;

    if (set == -1)
        return -EINVAL;

    if (set)
        blk_queue_flag_set(QUEUE_FLAG_WC, q);
    else
        blk_queue_flag_clear(QUEUE_FLAG_WC, q);

    return count;
}

nvme spec:

enter image description here

Chen Li
  • 4,824
  • 3
  • 28
  • 55