6

Can someone point me to a fairly complete reference on the Linux I/O system, primarily how all the buffers and caches are handled and flushed?

My understanding so far is that there are

  • Application buffers (including fread/fwrite buffers allocated by libc)
  • VFS buffers that read and write act on
  • pages that are mmaped (same as VFS buffers?)
  • Filesystem-specific buffers (same as VFS buffers? Filesystem supplies some policy at least, e.g. XFS is more aggressive about write caching)
  • The disk driver probably has some buffers, before translating to SCSI/ATA commands and passing to the...
  • Disk controller can have volatile, battery-backed, or no cache. What are the mechanisms for flushing volatile caches? How do barriers factor in?
  • The disk itself can have some cache, with the same flushing issues as the controller.

Obviously this is a fairly confused account, but hopefully it shows the sort of information I'm looking for. I've found Linux internals documentation pretty sparse; maybe there's a good book that covers all this? Discussion of where buffers are copied vs. transferred would be nice, too.

Ben Jencks
  • 1,361
  • 8
  • 13

1 Answers1

3

Wow. Obviously, you are after very particular pieces and parts to solve a particular problem. Or else you don't realize you just signed away the next week of your life to research. Getting through all of the kernel internals for your topics is going to be painful, so here's what you need to do: ask one, simple question and start tracing. Given a simple, unconfused question to focus on, there are many developers on the Linux Kernel Mailing List who are experts at explaining why the internals behave the way they do in your situation. It may take a couple of rounds, but they can help.

Another method you can use, given a question with a single purpose, is to trace a questionable activity into the kernel and learn about the individual pieces it touches rather then just trying to understand it all. Fortunately for you, there is a command called ftrace or even SystemTap (stap) that will start you on your adventure. Many kernel developers want more people to ask important questions about their kernel and these tools will help them do it. Linux Weekly News has run several articles on ftrace lately: Tracing: no shortage of options (Jul 2008), A Look at ftrace (Mar 2009), Debugging the kernel using ftrace - Part 1 (Dec 2009) and Part 2 (Dec 2009), Secrets of the Ftrace function (Jan 2010), and finally the good old documentation that ships with the kernel (2008).

By using a tracing utility, you learn about how the kernel does buffering and a host of other things unique to your kernel, hardware (controller, chipset, CPU, disk technology), filesystem, IO scheduler. Every distribution is different in this regard. If you have complex storage devices (Cluster FS, SAN with an Enterprise Array, SSD), then be prepared to get your hands dirty learning about their quirks too. A word of warning about cluster fileystems: they frequently involve a userspace component that can cause lots of unexpected delays most of us attribute to the kernel, but are much more complex than that.

By far, the best text I've been able to find so far was written by Neil Brown in 2009 entitled "Linux kernel design patterns". Neil hits a lot of topics you brought up and many more.

One thing I know for sure is that this landscape is continually changing, especially in the scheduling arena. Just try to understand what is going on in your particular corner and count your blessings that you don't have to code to one of those components.

zerolagtime
  • 1,428
  • 9
  • 10
  • Alternatively, someone could approach LWN (http://lwn.net/op/AuthorGuide.lwn) and write this up as a story, "for minimal pay," but also recognition by their peers. – zerolagtime Dec 01 '10 at 20:42
  • Thanks for the detailed answer. I'm actually not working on a particular problem, I just keep seeing various half-explanations (sometimes contradictory) whenever I configure a database or filesystem (or anything else that cares about consistency), and I want to have a complete picture of what's going on. I'll start with the design patterns articles, and then look at some ftraces. Sounds like I'll be reading some kernel code, too. – Ben Jencks Dec 03 '10 at 03:30