rm intermittently causes disk to lock up

Question

I have encountered this extremely strange problem on two servers now, both running CentOS5, both of them ext4. One is a SSD, the other is a regular hard drive, both SATA without RAID.

The issue is the following, when I run rm -r on a directory with a large number of subdirectories (>1000) where each subdirectory has a large number of files (>1000), the disk where these directories reside will lock up intermittently.

This can be seen through top. Usually, the rm command will have CPU usage about 50-60% but suddenly, it will drop to zero for 10-15 seconds before returning to 50-60% for 3-4 seconds before dropping to zero again. During the time the rm command is at 0% cpu, even simple commands like ls on the drive in question will hang and nothing shows up on screen until rm is running at 50-60% again.

When rm is running at 0%, in top, I also get 0.0%wa.

As you can imagine, this constant hanging of the disk makes processing extremely slow. I am hesitant to blame it on a bad disk because I have now seen this behavior on two different systems.

Does anybody have any ideas?

EDIT: Also want to point out that when rm is running at 0.0% cpu, jbd2/sdc1-8 is still active on the disk in question.

Hi Can you run strace on the rm command like this strace -ffttTo /tmp/strace.out rm . Run this on both of the servers. It will show the syscall where time is being spent. Allow strace to run fully and don't interrupt it. Also, I think rm comes from procps. but please check from which package you are getting the binary and then provide the package version. — Soham Chakraborty, Feb 24 '13 at 16:25
I would also ask to check if the process (or which processes) are in uninterruptible sleep (check for D status on ps's output), although if you say that there's 0% usage in wa, that's weird. But worth checking too. — golan, Mar 01 '13 at 17:08

score 3 · Answer 1 · answered Feb 22 '13 at 15:58

3

Not a solution but a workaround: You could start rm with ionice -c3. If you can reproduce this problem you may trace it with strace -tt -o rm.strace rm ... and contact the ext4 developers.

answered Feb 22 '13 at 15:58

Hauke Laging

5,285
2
24
40

Hi Hauke, can you elaborate in more details exactly how this is done? I suspect this is ext4 bug as well. – user788171 Feb 22 '13 at 19:07
I assume that the problem can be seen in the strace output, e.g. by longer times between the file deletions. But I can't tell you what information the developers need, ask them. In theory this might even be a bug in rm, not related to ext4. What kind of CPU usage does top show, userspace load or system load? If you prepend your rm call with strace -tt -o rm.strace then strace creates a text file with timestamps and all kernel calls by rm. Just give it a try. – Hauke Laging Feb 22 '13 at 21:33

score 1 · Answer 2 · answered Feb 28 '13 at 07:03

1

Firstly,

On the ssd filesystem you will want to enable the disgard option. e.g.

 # mount -t ext4 -o discard /dev/ssd_dev /mnt/storage/location

You can read up on it here (RedHat SSD Tuning)

Lastly, you might want to review your block sizes as harddrives and SSDs sizes to differ. But If you don't want to reinstall the system, then I think a remount with the disgard option should do the trick.

Updated: The slow rm can be attributed to the filesystem write barrier as explained here

Cheers, Danie

answered Feb 28 '13 at 07:03

Danie

1,360
10
12

Some people suggest to not user **discard** mount option; they suggest to use a cron to do the trimming. [link](https://wiki.archlinux.org/index.php/Solid_State_Drives#Apply_TRIM_via_cron) – Andrea de Palo Mar 01 '13 at 10:14
If you update your answer with a bit of explanation why `discard` will help in this case (with your words) - you will probably get more upvotes on your answer. – Nils Mar 02 '13 at 20:40

score 1 · Answer 3 · answered Mar 01 '13 at 16:49

1

Deleting millions of files results in millions of transactions. That is going to quickly fill up the journal. The stalls you are seeing are caused by the journal being flushed.

Using a larger journal should allow more transactions to be batched up before flushing, so you should see fewer stalls like this.

The default journal size is normally 128 MB. You can use tune2fs -J size=512 on a cleanly unmounted fs to quadruple the journal size

answered Mar 01 '13 at 16:49

psusi

3,347
1
17
9

Could also use find to batch of file deletions, and do rm -rf afterwards: `find . -type f -print0|xargs -n50 -0 rm -f`. That might split it up a bit. – lsd Mar 01 '13 at 17:52
@lsd, that's just going to slow things down. – psusi Mar 01 '13 at 20:24
BTW - Journal on SSD - is it a good idea at all to use a journal on an SSD? Can you add a good link about how the journal is being used in your answer? – Nils Mar 02 '13 at 20:43
@Nils, yes, placing the journal on an SSD will help too, and if you make it large enough, enabling the `data=journal` mount option can speed things up too since synchronous writes can be completed as soon as the data hits the ssd, and then it can be migrated to the slow hd in the background. – psusi Mar 03 '13 at 03:19
But a journal is something with a large number of write-requests. A SSD is not good at that - it is better suited for a large number of read-requests. So either put that journal onto reliable ha-RAM or a fast set of RAID instead? – Nils Mar 04 '13 at 12:30
@Nils, a RAID won't help much since raids increase throughput but do not help with latency. Sure, the SSD will wear out eventually, but should still last for years. I don't know what you mean by ha-ram. – psusi Mar 04 '13 at 14:59
I thought about using a RAM-disk (like /dev/shm) for placing the journals. In that case such a RAM-disk needs to be battery-buffered and should be reliable RAM (ecc or mirrored or both). – Nils Mar 04 '13 at 16:27
@Nils, yes, in theory that would work nicely, but I'm not aware of any battery backed ram cards on the market. – psusi Mar 04 '13 at 19:01

score -1 · Answer 4 · edited Mar 01 '13 at 16:00

-1

I have found that when using the recursive option to remove a large number of files, it's best to write a simple bash script using a for loop to remove files individually. Something similar to:

for f in /path/to/dir/*
do
   # if file, delete it
   [ -f "$f" ] && rm "$f"
done

edited Mar 01 '13 at 16:00

quanta

51,413
19
159
217

answered Mar 01 '13 at 15:59

tdk2fe

600
2
13

I'm not sure that this is a good solution. – mdpc Mar 01 '13 at 16:52
Always open to better ideas - can you elaborate on why it's not a good solution? – tdk2fe Mar 01 '13 at 18:47
Acually this will never work on a VERY large number of files, since the for-expansion will not fit into the limited allowed space. `find /path/to/dir -type f -delete` should do the same - without the expansion problem. – Nils Mar 02 '13 at 20:38
I guess I haven't ran into a large enough scenario to encounter that - but thanks for the clarification! – tdk2fe Mar 04 '13 at 13:54

rm intermittently causes disk to lock up

4 Answers4