1

I am looking at the code copying a sparse file (to another sparse file), it uses DeviceIoControl(... FSCTL_QUERY_ALLOCATED_RANGES ...) to get a list of ranges containing actual data.

Is it guaranteed that result contains ranges that:

  • don't intersect?

  • are ordered by FileOffset field?

  • aren't empty?

  • have FileOffset + Length > FileOffset (i.e. no wraparound wrt uint64_t)?

Edit:

I've implemented validation just in case if OS doesn't give any of these guarantees:

// version of std::remove_if that permits coalescing
template<class ForwardIt, class BinaryPredicate>
ForwardIt coalesce_neighbors(ForwardIt first, ForwardIt last, BinaryPredicate p)
{
    for(ForwardIt i = first; i != last; ++i)
        if (!p(*first, *i))
        {
            if (first != i)
                *first = std::move(*i);
            ++first;
        }
    return first;
}


// for given range set do: sort by offset, check for validity, discard empty ones, coalesce intersecting/adjacent ones
FILE_ALLOCATED_RANGE_BUFFER* sanitize_ranges_(FILE_ALLOCATED_RANGE_BUFFER* p, FILE_ALLOCATED_RANGE_BUFFER* p_end)
{
    auto ui = [](LARGE_INTEGER const& v){ return static_cast<ULONGLONG>(v.QuadPart); };

    std::sort(p, p_end, [=](auto& l, auto& r){ return ui(l.FileOffset) < ui(r.FileOffset); });  // sort ranges by offset

    return coalesce_neighbors(p, p_end, [=](auto& l, auto& r){
        if (std::numeric_limits<ULONGLONG>::max() - ui(r.FileOffset) < ui(r.Length))            // no wraparounds allowed
            throw std::logic_error("invalid range (wraparound)");

        if (ui(r.Length) == 0) return true;                                                     // discard empty ranges

        if (&l != &r && ui(l.FileOffset) + ui(l.Length) >= ui(r.FileOffset))                    // 'l.offset <= r.offset' is guranteed due to sorting
        {
            l.Length.QuadPart = ui(r.FileOffset) + ui(r.Length) - ui(l.FileOffset);             // coalesce intersecting/adjacent ranges
            return true;                                                                        // ... and discard neighbor we ate
        }

        return false;
    });
}
C.M.
  • 3,071
  • 1
  • 14
  • 33
  • The [doc](https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_query_allocated_ranges#remarks) said that, Allocated ranges are subject to the rule that a memory mapped remote (network) file and an open handle to the file are not necessarily coherent. If you memory mapped a sparse network file and wrote nonzero data to previously unallocated regions of the file, disk space would be allocated for the new data. However, a call to `FSCTL_QUERY_ALLOCATED_RANGES` thereafter would not necessarily return a correct list of allocated regions. – Strive Sun Feb 03 '21 at 09:55
  • @StriveSun-MSFT That doc talks about coherency of what you see(or done) through mapped view and file handle -- it says nothing about order of ranges in returned array (or other guarantees mentioned in the post). Am I misreading it? – C.M. Feb 03 '21 at 17:21
  • Have you verified the range of returned non-zero data using `FSCTL_QUERY_ALLOCATED_RANGES`? I found through testing that in a sparse file, it will only return a relatively large range, which contains non-zero data. Not sure if the result I got is the same as you, or if I missed something. I only tested a single sparse file. – Strive Sun Feb 05 '21 at 09:49

1 Answers1

0

Well, it is not much, but the usage of FSCTL_QUERY_ALLOCATED_RANGES in this file from a Microsoft repository seems to indicate that the answer for your second question is: yes, the results are indeed ordered by FileOffset.

The query is made for the whole file, but if ERROR_MORE_DATA is returned, the query is done again starting from the end of the last returned range.

lvella
  • 12,754
  • 11
  • 54
  • 106
  • Hmm... looks like code expects ranges to be ordered, but guarantee isn't mentioned (or enforced). Also, looks like it can return adjacent ranges. I suspect OS doesn't give any of mentioned guarantees -- simply hangs over what underlying driver returns (potentially over multiple requests). – C.M. Jun 25 '23 at 20:52