2

I need to understand which files consumes iops of my hard disc. Just using "strace" will not solve my problem. I want to know, which files are really written to disc, not to page cache. I tried to use "systemtap", but I cannot understand how to find out which files (filenames or inodes) consumes my iops. Is there any tools, which will solve my problem?

BHYCHIK
  • 108
  • 9

2 Answers2

2

Yeah, you can definitely use SystemTap for tracing that. When upper-layer (usually, a VFS subsystem) wants to issue I/O operation, it will call submit_bio and generic_make_request functions. Note that these doesn't necessary mean a single physical I/O operation. For example, writes from adjacent sectors can be merged by I/O scheduler.

The trick is how to determine file path name in generic_make_request. It is quite simple for reads, as this function will be called in the same context as read() call. Writes are usually asynchronous, so write() will simply update page cache entry and mark it as dirty, while submit_bio gets called by one of the writeback kernel threads which doesn't have info of original calling process:

Writes can be deduced by looking at page reference in bio structure -- it has mapping of struct address_space. struct file which corresponds to an open file also contains f_mapping which points to the same address_space instance and it also points to dentry containing name of the file (this can be done by using task_dentry_path)

So we would need two probes: one to capture attempts to read/write a file and save path and address_space into associative array and second to capture generic_make_request calls (this is performed by probe ioblock.request).

Here is an example script which counts IOPS:

// maps struct address_space to path name
global paths;

// IOPS per file
global iops;

// Capture attempts to read and write by VFS
probe kernel.function("vfs_read"),
      kernel.function("vfs_write") {
    mapping = $file->f_mapping;

    // Assemble full path name for running task (task_current())
    // from open file "$file" of type "struct file"
    path = task_dentry_path(task_current(), $file->f_path->dentry,
                            $file->f_path->mnt);

    paths[mapping] = path;
}

// Attach to generic_make_request()
probe ioblock.request {
    for (i = 0; i < $bio->bi_vcnt ; i++) {
        // Each BIO request may have more than one page
        // to write
        page = $bio->bi_io_vec[i]->bv_page;
        mapping = @cast(page, "struct page")->mapping;

        iops[paths[mapping], rw] <<< 1;
    }
}

// Once per second drain iops statistics
probe timer.s(1) {
    println(ctime());
    foreach([path+, rw] in iops) {
        printf("%3d %s %s\n", @count(iops[path, rw]), 
                              bio_rw_str(rw), path);
    }
    delete iops
}

This example script is works for XFS, but needs to be updated to support AIO and volume managers (including btrfs). Plus I'm not sure how it will handle metadata reads and writes, but it is a good start ;)

If you want to know more on SystemTap you can check out my book: http://myaut.github.io/dtrace-stap-book/kernel/async.html

Community
  • 1
  • 1
myaut
  • 11,174
  • 2
  • 30
  • 62
0

Maybe iotop gives you a hint about which process are doing I/O, in consequence you have an idea about the related files.

iotop --only

the --only option is used to see only processes or threads actually doing I/O, instead of showing all processes or threads