0

Context :
OS : Red hat 8.X
File systems : EXT4, XFS
Storage Types : SSD, HDD

Corruption : Meant here is an activity that result in written data cannot be retrieved as it was written. .e.g. Disk Device level corruption.

Linux read call signature is ssize_t read(int fd, void buf[.count], size_t count);.
Say the file referred by fd, has corrupted segments (+ NOT corrupted segments). If the read request goes through one or more corrupted segments(assume segments are A(OK)--B(corrupted)--C(OK)--D(corrupted)--E(OK) and fd's file position is set before the beginning of A and "count" is large enough to contain all A -> E segments),

  1. Is there a possibility of read's return value to be larger than ZERO ? (and buf to contain data) ?
    If so,
    1.1. What would be contained in buf ? will it contain any data from corrupted segments B and D ? What could be the return value of read' ?

    1.2 What are probability of this happening ? What factors could increase the probability of this happening ? e.g. re-boot ?

  2. Would the file size returned by fstat count any bytes from corrupted segments ?

Purpose : I am trying to decide(under above given OS, File system conditions), if I NEED to add a "application level calculated checksum" along with written(binary) data and when reading the same file if read returns success(i.e. return value > 0), validate the (app level written)checksum before concluding data as valid.
Also I am NOT worried about some intruder modifying the written data here. Only worried about things that can happen from system activity. e.g. machine re-boot

aKumara
  • 395
  • 1
  • 12
  • 2
    What do you consider to be a corrupted segmented of the file? Or what would be a segment of a file? For most functions, a file is just a sequence of bytes. – Gerhardh Jun 26 '23 at 11:06
  • 2
    If the corrupted segments cannot be read, you would likely get a short read for A and then an I/O error for B. You’d have too seek past B to get the data in C, and similarly seek past D to read E. But that assumes a specific meaning for “corrupted”. If the corruption is simply erroneous data according to some format the kernel is unaware of, the kernel will return the data “as is”. Yes; stat would count the bytes in the corrupted segments. With modern disk systems, corruption is vanishingly rare as a problem. Not non-existent, but extremely unusual. – Jonathan Leffler Jun 26 '23 at 11:25
  • @Gerhardh, Linux files are written in multiple blocks/segments(even though from application perspective it is shown as a single contiguous file). I am referring to this block/segment here. Corruption is, anything that saved data cannot be retrieved as it was written. – aKumara Jun 26 '23 at 11:31
  • @JonathanLeffler, could you pls clarify more on "some format the kernel is unaware of". Assuming application is writing binary data. – aKumara Jun 26 '23 at 11:44
  • The kernel has very limited understanding of file formats. It knows about executables, but that’s pretty much it. Everything else is just a “bag of bytes” and the kernel will be wholly unaware of incorrect byte sequences in the file. – Jonathan Leffler Jun 26 '23 at 11:48
  • @JonathanLeffler, so if I were to summarize your first comment, for my question #1, there is a possibility for read call to return larger than zero value (since you said "likely"(instead of definitely), that read return a error). #2 - if the written data by the application is just binary data(NOT a executable, NOT a .so file etc) , then fstat will count in bytes from corrupted segments. – aKumara Jun 26 '23 at 11:56
  • 1
    The C functions don't have a concept of corrupted data as you define it. For `fread` data is just data. No matter if it makes any sense for your program or if it is same as you have written before. – Gerhardh Jun 26 '23 at 11:58
  • @Gerhardh, hope my question is clear to you. – aKumara Jun 26 '23 at 12:01
  • Of course, if the corrupted sectors on the disk happen to contain filesystem metadata, it's possible that depending on the nature of the corruption the filesystem will go off the deep end and the system crashes, or maybe the filesystem detects that metadata checksums don't match, or something. – janneb Jun 26 '23 at 12:28
  • The only time you'll run into problems with a disk are when it is in the terminal stages of its life. Then it might be possible that block A is OK, block B is unreadable, block C is OK, block D is unreadable, block E is OK. But the system may be taking steps to disable the drive by then. The drives themselves do 'bad block mapping' to avoid bad blocks; eventually, they can run out of spare blocks. I haven't studied disk driver software, but it might be that if you try to read 5 blocks starting at A, the driver says "I read one block, A"; then the next call to read B fails. _[…continued 1…]_ – Jonathan Leffler Jun 26 '23 at 14:42
  • _[…continuation 1…]_ If you seek past block B to block C, the same sequence might occur with block C being a short read and block D failing. But this is a pretty outré example. If, in contrast, the problem is erroneous data written to the disk, then the reads will work fine, and the erroneous data will be returned as normal data. I forgot about file systems when I said "the kernel only understands executables" — the file system metadata has structure and the kernel might recognize corruption there. But you don't normally end up reading file system metadata. _[…continued 2…]_ – Jonathan Leffler Jun 26 '23 at 14:46
  • _[…continuation 2…]_ It isn't really clear to me what scenario you are worried about. Are you working at the level of a disk driver or the file system layer in the kernel, or in the hardware inside a disk driver, or at the application level outside the kernel? The latter is most likely — and there you barely have to worry about hardware defects and randomly unreadable blocks (unless the disk drive is in a state of terminal decay — in which case you need to go to backups and buy a new drive). If the kernel reports that a block is unreadable, you notify your user that the system is broken. – Jonathan Leffler Jun 26 '23 at 14:56
  • 1
    There are no "corrupted segments" on filesystem layer. There are read errors on block device layer. From userspace point of view, take a look how `dd` handles partial reads when run with `conv=noerror,sync`. – dimich Jun 26 '23 at 19:17
  • @aKumara: you're really asking the wrong question. Linux has multiple different filesystem types (EXT4 and XFS among them), each of which supports multiple kinds of physical devices (SSD and HDD among them). If there's *corruption*, it's handled *BELOW* the "read()" level. As Gerhardh said: `"The C functions don't have a concept of corrupted data as you define it. For fread data is just data."` – paulsm4 Jun 26 '23 at 22:45
  • @paulsm4, I am simply asking about the possibility of error detection in read(C function) when there are errors in disk. "Corruption is handled BELOW the read() level", TRUE, but if the BELOW layers cannot correct the errors, does read(C function) indicate it ? that is my question. – aKumara Jun 27 '23 at 04:48
  • @JonathanLeffler, I am working at application layer. I added purpose of my question for clarity. – aKumara Jun 27 '23 at 06:36

1 Answers1

1

If A can be read, the kernel will return the length of A, and that portion of the read will be successful. This would be known as a short read. Once that happens, if you make another call to read and B cannot be read, you will get an EIO error. That could be a problem with a network file system, a bad block, a file system error, or anything else that prevents the data from being read.

Once the call to read B fails, it will continue to fail because the file offset is not advanced beyond that. If you use pread to read an unaffected portion, or if you lseek to an unaffected portion, you'll be able to continue to read until you hit an affected portion.

This is generally the standard Unix behaviour, and would be expected of any POSIX system. The error code on failure might differ in some cases on some systems (for example, the OS might automatically remount the file system read only and return some other error code in that case), but generally one reads all the data that can be validly read, and then if further progress is not possible, one gets an error.

bk2204
  • 64,793
  • 6
  • 84
  • 100
  • thanks. So for my question "1. Is there a possibility of read's return value to be larger than ZERO ? ", your answer would be "highly unlikely" right ? actually I am trying to decide if I should add application level checksum when writing to a file(as part of the written data) and when reading, if read call returns success, validate checksum(which I added when writing) too at application level. From your answer I guess, this is NOT (or unlikely) to be required ? – aKumara Jun 27 '23 at 05:36
  • 1
    The first time, read's return value will be larger than zero because you're reading A successfully. Afterwards, it will be -1, because reading B will fail. – bk2204 Jun 27 '23 at 21:40