2

I have a program that is used to exercise several disk units in a raid configuration. 1 process synchronously (O_SYNC) writes random data to a file using write(). It then puts the name of the directory into a shared-memory queue, where a 2nd process is waiting for the queue to have entries to read the data back into memory using read().

The problem that I can't seem to overcome is that when the 2nd process attempts to read the data back into memory, none of the disk units show read accesses. The program has code to check whether or not the data read back in is equal to the code that is written to disk, and the data always matches.

My question is, how can I make the OS (IBM i) not buffer the data when it is written to disk so that the read() system call accesses the data on the disk rather than in cache? I am doing simple throughput calculations and the read() operations are always 10+ times faster than the write operations.

I have tried using the O_DIRECT flag, but cannot seem to get the data to write to the file. It could have to do with setting up the correct aligned buffers. I have also tried the posix_fadvise(fd, offset,len, POSIX_FADV_DONTNEED) system call.

I have read through this similar question but haven't found a solution. I can provide code if it would be helpful.

Community
  • 1
  • 1
burtmacklin16
  • 715
  • 2
  • 13
  • 31
  • You can turn of userland file caching,for example in C with setvbuf(). Kernelmode caching may require resetting kernel parameters, or running code that creates massive chunks of memory so that the kernel flushes all of its caches. By (IBM i) do you mean IBM itanium AIX? Or what? PS: direct io does not always imply that there is no caching going on. Systems can write directly, then save the data in a cache. OS designers are big on on using memory to bypass io bottlenecks. – jim mcnamara May 03 '13 at 19:26
  • @jimmcnamara - IBM i is OS/400 from the old days. It has a specific environment called PASE for i for running UNIX-like software. – burtmacklin16 May 03 '13 at 19:28

2 Answers2

2

...exercise several disk units in a raid configuration... How? IBM i doesn't allow a program access to the hardware. How are you directing I/O to any specific physical disks?

ANSWER: The write/read operations are done in parallel against IFS so the stream file manager is selecting which disks to target. By having enough threads reading/writing, the busyness of SYSBASE or an IASP can be driven up.

...none of the disk units show read accesses. None of them? Unless you are running the sole job on a system in restricted state, there is going to be read activity on the disks from other tasks. Is the system divided into multiple LPARs? Multiple ASPs? I'm suggesting that you may be monitoring disks that this program isn't writing to, because IBM i handles physical I/O, not programs.

ANSWER I guess none of them is a slight exaggeration - I know which disks belong to SYSBASE and those disks are not being targeted with many read requests. I was just trying to generalize for an audience not familiar w/IBM i. In the picture below, you will see that the write reqs are driving the % busyness up, but the read reqs are not even though they are targeting the same files.

enter image description here

...how can I make the OS (IBM i) not buffer the data when it is written to disk... Use a memory starved main storage pool to maximise paging, write immense blocks of data so as to guarantee that the system and disk controller caches overflow and use a busy machine so that other tasks are demanding disk I/O as well.

burtmacklin16
  • 715
  • 2
  • 13
  • 31
Buck Calabro
  • 7,558
  • 22
  • 25
  • OK, so you're not trying to target specific disks as much as hit an entire iASP? Run many, many copies of this process. These machines are designed from the ground up to handle a lot of I/O, so you're going to rally have to work to get it to break a sweat. Use large, random blocks of data. – Buck Calabro May 03 '13 at 20:14
  • Yes that is correct - I'm targeting an entire iASP or SYSBASE. As you can see in the picture, the writes are in fact driving the busyness adequately, but the read reqs are not. I am thinking that IBM i is caching the write operations so the reads are done right out of cache rather than from disk. – burtmacklin16 May 03 '13 at 20:19
  • 1
    That's a lot of writes! :-) In order to outsmart the caching, you need to overwhelm it. If you write say, the single letter 'A' to a file, that will clearly fit in the cache even if you run a hundred threads. Use much larger blocks of data in more threads and overwhelm the caches. – Buck Calabro May 03 '13 at 20:37
  • 1
    Also, bear in mind that there are three separate buffers/caches. First is any buffers used by an individual program. Second is basic memory. The system will do what it can to use as much memory as it can spare. Entire database files will often exist in physical memory if accessed often enough and nothing else is demanding memory pages. But also the disk controllers have large caches. There is little publicized info on how controller cache relates to individual disk I/Os. The OS itself is unaware of any "disks" and only knows about separate disk pools. – user2338816 Apr 06 '14 at 14:28
2

My though is that if you write ENOUGH data, then there simply won't be enough memory to cache it, and thus SOME data must be written to disk.

You can also, if you want to make sure that small writes to your file works, try writing ANOTHER large file (either from the same process or a different one - for example, you could start a process like dd if=/dev/zero of=myfile.dat bs=4k count=some_large_number) to force other data to fill the cache.

Another "trick" may be to "chew up" some (more like most) of the RAM in the system - just allocate a large lump of memory, then write to some small part of it at a time - for example, an array of integers, where you write to every 256th entry of the array in a loop, moving to one step forward each time - that way, you walk through ALL of the memory quickly, and since you are writing continuously to all of it, the memory will have to be resident. [I used this technique to simulate a "busy" virtual machine when running VM tests].

The other option is of course to nobble the caching system itself in OS/filesystem driver, but I would be very worried about doing that - it will almost certainly slow the system down to a slow crawl, and unless there is an existing option to disable it, you may find it hard to do accurately/correctly/reliably.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227