Why do md array rebuilds start?

Question

I've recently made mdadm report about the events and yesterday I received a number of rebuild events. Why was the rebuild started? I don't see anything relevant in the journal. Also I'm concerned about the "mismatches found: 128 (on raid level 1)" part. What does this mean? I'm running Ubuntu 20.04.1 LTS.

About the research effort, I inspected the man page. It says:

RebuildStarted
An md array started reconstruction (e.g. recovery, resync, reshape, check, repair). (syslog priority: Warning)

Okay it can mean a number of things, so... which was in my case? And what caused it? I tried googling for the reason, and couldn't find anything. Even now knowing the reason I can't find any info.

Hover your mouse over the downvote button, read the reasons, think about what it could be. — Gerald Schneider, Jul 04 '22 at 15:44
@GeraldSchneider Unclear? I doubt that. Not useful? I think it's useful because I don't see the answer in Google. And it looks like something one should run into, sooner or later, on a server with a software RAID. Do you think I did enough research, now that I showed it? P.S. It's kind of sad that a useful question that doesn't show research effort... I bet it'd receive less views, if any at all. — x-yuri, Jul 05 '22 at 00:06
True, it's sad when good questions don't show any research effort. And there was absolutely zero in this case. Now there is, so you got an upvote from me. If you see it that way, the downvote was actually a good thing because it pushed you into improving your question. — Gerald Schneider, Jul 05 '22 at 04:47

Nikita Kipriyanov · Accepted Answer · 2022-07-04T12:27:31.717

2

This is not a rebuild, but a check.

      [>....................]  check =  0.0% (2816/33520640) finish=197.2min speed=2816K/sec

Debian (and I suppose, Ubuntu as a derivative) installs a cron job which checks all arrays once a month.

This is scrubbing — essential part in monitoring and maintaining a RAID. It ensures various BER-induced errors don't corrupt your data (or, at least, you will know about that corruption sooner and you'll be able to take steps to mitigate). Also it allows to detect failing devices early. Which is A Goog Thing™.

MD reads both drives and ensures they contain the same data. Or, in case of more complex RAID levels, it reads all drives and checks if parity syndroms match. If something is wrong, it will try to correct and also it will warn you. For instance, mismatches are these unexpected discrepancies between two media. If you notice them often, it is the reason to check your storage thoroughly. Probably you need to reseat some cables or SSDs in their slots, or even replace them.

By the way, HW RAIDs and large SAN systems also implement this consistency checks under the hood.

edited Jul 04 '22 at 12:27

answered Jul 04 '22 at 12:10

Nikita Kipriyanov

10,947
2
24
45

"This is not a rebuild, but a check." A misleading event name then... By the way, I've found the description in the [man pages](https://man.archlinux.org/man/md.4.en#SCRUBBING_AND_MISMATCHES). And relevant records in the journal (initially failed to correctly convert time from the local to the server time zone). The monitoring these days is handled by [`systemd` timers](https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/systemd?h=mdadm-4.2), at least on Debian/Ubuntu. – x-yuri Jul 05 '22 at 02:10
2

Ok, in latest versions they do it using systemd timers. In Unix world, any kind of the job which starts periodically according to some schedule is often colloquially called a "cron job", even in the scheduler is not cron. – Nikita Kipriyanov Jul 05 '22 at 03:37
I can say that it started with once in around two months. These days it's once a month (apparently on every scrubbing). One of the rebuilds was in the middle of a month (a manual start?..). But the RAID seems to be doing okay. It should probably be noted here, that I don't think that many people watch out for these events. – x-yuri Mar 09 '23 at 10:45

Why do md array rebuilds start?

1 Answers1