11

I am looking for a definitive specification describing the expected arguments and behavior of ioctl 0x1268 (BLKSSZGET).

This number is declared in many places (none of which contain a definitive reference source), such as linux/fs.h, but I can find no specification for it.

Surely, somebody at some point in the past decided that 0x1268 would get the physical sector size of a device and documented that somewhere. Where does this information come from and where can I find it?

Edit: I am not asking what BLKSSZGET does in general, nor am I asking what header it is defined in. I am looking for a definitive, standardized source that states what argument types it should take and what its behavior should be for any driver that implements it.

Specifically, I am asking because there appears to be a bug in blkdiscard in util-linux 2.23 (and 2.24) where the sector size is queried in to a uint64_t, but the high 32-bits are untouched since BLKSSZGET appears to expect a 32-bit integer, and this leads to an incorrect sector size, incorrect alignment calculations, and failures in blkdiscard when it should succeed. So before I submit a patch, I need to determine, with absolute certainty, if the problem is that blkdiscard should be using a 32-bit integer, or if the driver implementation in my kernel should be using a 64-bit integer.

Edit 2: Since we're on the topic, the proposed patch presuming blkdiscard is incorrect is:

--- sys-utils/blkdiscard.c-2.23 2013-11-01 18:28:19.270004947 -0400
+++ sys-utils/blkdiscard.c  2013-11-01 18:29:07.334002382 -0400
@@ -71,7 +71,8 @@
 {
    char *path;
    int c, fd, verbose = 0, secure = 0;
-   uint64_t end, blksize, secsize, range[2];
+   uint64_t end, blksize, range[2];
+   uint32_t secsize;
    struct stat sb;

    static const struct option longopts[] = {
@@ -146,8 +147,8 @@
        err(EXIT_FAILURE, _("%s: BLKSSZGET ioctl failed"), path);

    /* align range to the sector size */
-   range[0] = (range[0] + secsize - 1) & ~(secsize - 1);
-   range[1] &= ~(secsize - 1);
+   range[0] = (range[0] + (uint64_t)secsize - 1) & ~((uint64_t)secsize - 1);
+   range[1] &= ~((uint64_t)secsize - 1);

    /* is the range end behind the end of the device ?*/
    end = range[0] + range[1];

Applied to e.g. https://www.kernel.org/pub/linux/utils/util-linux/v2.23/.

Roman C
  • 49,761
  • 33
  • 66
  • 176
Jason C
  • 38,729
  • 14
  • 126
  • 182
  • Thank you, but I already know what BLKSSZGET does. I'm looking for a definitive specification of its behavior. In particular, does it take a pointer to a 32- or 64-bit integer as a parameter and, more importantly, *how do you know*? – Jason C Nov 01 '13 at 22:35
  • Fair enough, I know 0 about it, thought that might help; will keep digging. – slm Nov 01 '13 at 22:40
  • Given few probably know about this could you elaborate your Q a bit adding some context? – slm Nov 01 '13 at 22:41
  • @slm I just updated it. – Jason C Nov 01 '13 at 22:41
  • Well according to `block/ioctl.c` it seems to be using `int` for `BLKSSZGET`. Whatever that means. And it comes from `bdev_logical_block_size` which is defined as `static inline unsigned short bdev_logical_block_size(struct block_device *bdev)` in `include/linux/blkdev.h`. – frostschutz Nov 01 '13 at 23:32
  • Thanks; I saw *block/ioctl.c* as well, but was unable to determine if that was the reference implementation, or if the data types there were standardized elsewhere (for example, `blkdiscard` could be correct while my own kernel's `ioctl.c` could be where the mistake is). The size of an `int` varies between platforms, and even varies between compilers on the same platform. It's very vague... there's *got* to be a specification *somewhere*... maybe. – Jason C Nov 02 '13 at 00:02
  • Yes; when issuing `blkdiscard --offset 7583301632 --length 512` to a device (SD card, sector size 512, 8GB), the `BLKDISCARD` ioctl call failed (failed with certain other values too). I added a `printf` to `blkdiscard` to show the detected sector and total size, ran `blkdiscard --offset 0 --length 512`, and it printed `secsize=13213774770874614272 blksize=7822376960`. Initializing `secsize` to 0 yielded a correct result (for the reasons you described in your answer) as well as changing it to a 32-bit int. My kernel source also shows it's expecting a 32-bit int. – Jason C Nov 02 '13 at 00:36
  • I asked on the kernel list, and after some back-and-forth, the question was answered. See my answer below for details. Also, I still believe this belongs on unix.se. :) – Jason C Nov 03 '13 at 20:40

1 Answers1

12

The answer to "where is this specified?" does seem to be the kernel source.

I asked the question on the kernel mailing list here: https://lkml.org/lkml/2013/11/1/620

In response, Theodore Ts'o wrote (note: he mistakenly identified sys-utils/blkdiscard.c in his list but it's inconsequential):

BLKSSZGET returns an int.  If you look at the sources of util-linux
v2.23, you'll see it passes an int to BLKSSZGET in 

    sys-utils/blkdiscard.c
    lib/blkdev.c

E2fsprogs also expects BLKSSZGET to return an int, and if you look at
the kernel sources, it very clearly returns an int.

The one place it doesn't is in sys-utils/blkdiscard.c, where as you
have noted, it is passing in a uint64 to BLKSSZGET.  This looks like
it's a bug in sys-util/blkdiscard.c.

He then went on to submit a patch¹ to blkdiscard at util-linux:

--- a/sys-utils/blkdiscard.c
+++ b/sys-utils/blkdiscard.c
@@ -70,8 +70,8 @@ static void __attribute__((__noreturn__)) usage(FILE *out)
 int main(int argc, char **argv)
 {
        char *path;
-       int c, fd, verbose = 0, secure = 0;
-       uint64_t end, blksize, secsize, range[2];
+       int c, fd, verbose = 0, secure = 0, secsize;
+       uint64_t end, blksize, range[2];
        struct stat sb;

        static const struct option longopts[] = {

I had been hesitant to mention the blkdiscard tool in both my mailing list post and the original version of this SO question specifically for this reason: I know what's in my kernel's source, it's already easy enough to modify blkdiscard to agree with the source, and this ended up distracting from the real question of "where is this documented?".

So, as for the specifics, somebody more official than me has also stated that the BLKSSZGET ioctl takes an int, but the general question regarding documentation remained. I then followed up with https://lkml.org/lkml/2013/11/3/125 and received another reply from Theodore Ts'o (wiki for credibility) answering the question. He wrote:

> There was a bigger question hidden behind the context there that I'm
> still wondering about: Are these ioctl interfaces specified and
> documented somewhere? From what I've seen, and from your response, the
> implication is that the kernel source *is* the specification, and not
> document exists that the kernel is expected to comply with; is this
> the case?

The kernel source is the specification.  Some of these ioctl are
documented as part of the linux man pages, for which the project home
page is here:

     https://www.kernel.org/doc/man-pages/

However, these document existing practice; if there is a discrepancy
between what is in the kernel has implemented and the Linux man pages,
it is the Linux man pages which are buggy and which will be changed.
That is man pages are descriptive, not perscriptive.

I also asked about the use of "int" in general for public kernel APIs, his response is there although that is off-topic here.

Answer: So, there you have it, the final answer is: The ioctl interfaces are specified by the kernel source itself; there is no document that the kernel adheres to. There is documentation to describe the kernel's implementations of various ioctls, but if there is a mismatch, it is an error in the documentation, not in the kernel.

¹ With all the above in mind, I want to point out that an important difference in the patch Theodore Ts'o submitted, compared to mine, is the use of "int" rather than "uint32_t" -- BLKSSZGET, as per kernel source, does indeed expect an argument that is whatever size "int" is on the platform, not a forced 32-bit value.

Jason C
  • 38,729
  • 14
  • 126
  • 182
  • 1
    It might be worth noting that the variable returned by the `BLKSSZGET` ioctl is actually an [unsigned int](https://elixir.bootlin.com/linux/v5.8/source/block/ioctl.c#L525), as returned by `bdev_logical_block_size(bdev)`. – uncleremus Aug 06 '20 at 08:45
  • @uncleremus I'm just now seeing your comment. That is definitely worth noting. Not sure how much has changed since 2013 but it could be worth a quick code review of the current blkdiscard. Waaaay down on my todo list though, haha. – Jason C Mar 09 '22 at 15:16