0

As root I yesterday did

dd if=/dev/zero of=/var/lib/libvirt/images/dat.img bs=1G count=1000

and for many hours it made the server completely unresponsive.

The raid controller is an HP P410 with 4 disks in raid 1+0.

Is the problem that I used bs=1G or perhaps that CentOS needs a kernel parameter, so heavy I/O can't grind the host to a halt?

Question

Can anyone explain why this happens?

Ps. next time I will create a sparse file, but right now I would like to understand how to prevent I/O from maxing out the host.

Sandra
  • 10,303
  • 38
  • 112
  • 165
  • 1
    Writing one Tb of data can take some time. Is the /var filesystem the only one using the raid controller? – Gerard H. Pille Sep 26 '18 at 10:04
  • No, all 4 disks are in one raid 1+0, and then split in two partitions. `/var/lib/libvirt/images` is its own partition. – Sandra Sep 26 '18 at 10:22
  • What's the other partition on the raid? – Gerard H. Pille Sep 26 '18 at 10:36
  • That is `/`..... – Sandra Sep 26 '18 at 10:41
  • 1
    Oops. Can you have a look at the first answer here: https://unix.stackexchange.com/questions/48138/how-to-throttle-per-process-i-o-to-a-max-limit, I'm under the impression it would allow you to target a specific filesystem via major:minor device number. – Gerard H. Pille Sep 26 '18 at 10:58
  • `ionice` seams like a really good solution for running processes. And I suppose if I want it by default for all processes, then I have to look into cgroups? – Sandra Sep 26 '18 at 11:25
  • 1
    It looks that way. Now I have zero experience with that, always needed to get things as fast as possible on disk. I've seen your problem now and then when creating datafiles for Oracle databases. You can compare that to your dd: a large amount of nothing useful is being written. Do you really need this? – Gerard H. Pille Sep 26 '18 at 11:31
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/83677/discussion-between-sandra-and-gerard-h-pille). – Sandra Sep 26 '18 at 11:40

2 Answers2

3

Short answer: the server become unresponsive because you filled almost all memory with dirty pages (ie: data to be flushed out).

Long answer: generally, writes do not push data to the backing device immediately. Rather, they are cached in the pagecache. This is done for performance reasons: storage (especially HDDs) is quite slow compared to CPU/memory, so caching as much possible significantly increse I/O speed. However, if you write to much at all, the pagecache will frenetically (and with high priority) flush to disks as much data as possible. This cause the calling process to go into "deep sleep" - meaning that you can not interrupt it because, well, it is not really running, rather itself is waiting for be woken by the kernel. Moreover, as dirty data flushing is a costly and high priority operation, the entire server become very slow.

That said, how can you create large image files without reducing the server to a crawl? You had different options:

  • launch your dd command appending the oflag=direct option: this will cause dd to bypass the pagecache, directly writing to disks. I also suggest to use a smaller buffer, ie: 1 MB, using something similar to that dd if=/dev/zero of=/var/lib/libvirt/images/dat.img bs=1M count=1000000. Please note that this command above will somewhat slow down the server during execution (after all, you are writing to disks), but nowhere near you first try;
  • a better approach to create a fully-allocated file is to use the fallocate command, ie: fallocate /var/lib/libvirt/images/dat.img -l 1G. By allocating via metadata handling, and not writing any real data, this command will return almost immediately, causing no slow down at all. Many modern filesystems support it, with the notable exception of ZFS;
  • finally, you can use a sparse file - a file with a nominal size but no real allocation. You can create it by issuing truncate --size=1G /var/lib/libvirt/images/dat.img. This command will return immediately, causing basically almost no I/O at all.
shodanshok
  • 47,711
  • 7
  • 111
  • 180
1

dd allocates a buffer of size bs - if this is a fair chunk of your memory it'll displace other memory and cause swapping.

Worse, the bs buffer is written out in a single operation. This may bind your storage system for a while. In combination with the swapping from above this may tie up the system seriously.

So, it's reasonable to use for instance bs=16M for your task. It's a decent enough buffer size to get efficient I/O while being granular enough to not tie up anything too much.

Zac67
  • 10,320
  • 2
  • 12
  • 32