Alright, I think I figured out the issue with the help of WinDbg and a thorough look at wow64.dll
using IDA.
NB: the wow64.dll
I have has the same build number, but differs slightly in data only (checksum, security directory entry, pieces from version resources). The code is identical, which was to be expected, given deterministic builds and how they affect the PE timestamp.
There's an internal function called whNtQueryObject_SpecialQueryCase
(according to PDBs), which covers the ObjectTypesInformation
class queries.
For the above wow64.dll
I used the following points of interest in WinDbg, from a 32 bit program which calls NtQueryObject(NULL, ObjectTypesInformation, ...)
(the program itself is irrelevant, though):
0:000> .load wow64exts
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B0E0
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B14E
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B1A7
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B24A
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B252
Explanation of the above points of interest:
- +B0E0: computing length required for 64 bit query, based on passed length for 32 bit
- +B14E: call to
NtQueryObject()
- +B1A7: loop body for copying 64 to 32 bit buffer contents, after successful
NtQueryObject()
call
- +B24A: computing written length by subtracting current (last + 1) entry from base buffer address
- +B252: downsizing returned (64 bit) required length to 32 bit
The logic of this function in regards to just ObjectTypesInformation
is roughly as follows:
Common steps
- Take the
ObjectInformationLength
(32 bit query!) argument and size it up to fit the 64 bit info
- Align the retrieved size up to the next 16 byte boundary
- If necessary allocate the resulting amount from some
PEB::ProcessHeap
and store in TLS slot 3; otherwise using this as a scratch space
- Call
NtQueryObject()
passing the buffer and length from the two previous steps
The length passed to NtQueryObject()
is the one from step 1, not the one aligned to a 16 byte boundary. There seems to be some sort of header to this scratch space, so perhaps that's where the 16 byte alignment comes from?
Case 1: buffer size too small (here: 4), just querying required length
The up-sized length in this case equals 4, which is too small and consequently NtQueryObject()
returns STATUS_INFO_LENGTH_MISMATCH
. Required size is reported as 8968.
- Down-size from the 64 bit required length to 32 bit and end up 16 bytes too short
- Return the status from
NtQueryObject()
and the down-sized required length form the previous step
Case 2: buffer size supposedly (!) sufficient
- Copy
OBJECT_TYPES_INFORMATION::NumberOfTypes
from queried buffer to 32 bit one
- Step to the first entry (
OBJECT_TYPE_INFORMATION
) of source (64 bit) and target (32 bit) buffer, 8 and 4 byte aligned respectively
- For for each entry up to
OBJECT_TYPES_INFORMATION::NumberOfTypes
:
- Copy
UNICODE_STRING::Length
and UNICODE_STRING::MaximumLength
for TypeName
member
memcpy()
UNICODE_STRING::Length
bytes from source to target UNICODE_STRING::Buffer
(target entry + sizeof(OBJECT_TYPE_INFORMATION32)
- Add terminating zero (
WCHAR
) past the memcpy'd string
- Copy the individual members past the
TypeName
from 64 to 32 bit struct
- Compute pointer of next entry by aligning
UNICODE_STRING::MaximumLength
up to an 8 byte boundary (i.e. the ULONG_PTR
alignment mentioned in the other answer) + sizeof(OBJECT_TYPE_INFORMATION64)
(already 8 byte aligned!)
- The next target entry (32 bit) gets 4 byte aligned instead
- At the end compute required (32 bit) length by subtracting the value we arrived at for the "next" entry (i.e. one past the last) from the base address of the buffer passed by the WOW64 program (32 bit) to
NtQueryObject()
- In my debugged scenario these were:
0x008ce050 - 0x008cbfe8 = 0x00002068
(= 8296), which is 16 bytes larger than the buffer length we were told during case 1 (8280)!
The issue
That crucial last step differs between merely querying and actually getting the buffer filled. There is no further bounds checking in that loop I described for case 2.
And this means it will just overrun the passed buffer and return a written length bigger than the buffer length passed to it.
Possible solutions and workarounds
I'll have to approach this mathematically after some sleep, the workaround is obviously to top up the required length returned from case 1 in order to avoid the buffer overrun. The easiest method is to use my up_size_from_32bit()
from the example below and use that on the returned required size. This way you are allocating enough for the 64 bit buffer, while querying the 32 bit one. This should never overrun during the copy loop.
However, the fix in wow64.dll
is a little more involved, I guess. While adding bounds checking to the loop would help avert the overrun, it would mean that the caller would have to query for the required size twice, because the first time around it lies to us.
Which means the query-only case (1) would have to allocate that internal buffer after querying the required length for 64 bit, then get it filled and then walk the entries (just like the copy loop), skipping over the last entry to compute the required length the same as it is now done after the copy loop.
Example program demonstrating the "static" computation by wow64.dll
Build for x64, just the way wow64.dll
was!
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <cstdio>
typedef struct
{
ULONG JustPretending[24];
} OBJECT_TYPE_INFORMATION32;
typedef struct
{
ULONG JustPretending[26];
} OBJECT_TYPE_INFORMATION64;
constexpr ULONG size_delta_3264 = sizeof(OBJECT_TYPE_INFORMATION64) - sizeof(OBJECT_TYPE_INFORMATION32);
constexpr ULONG down_size_to_32bit(ULONG len)
{
return len - size_delta_3264 * ((len - 4) / sizeof(OBJECT_TYPE_INFORMATION64));
}
constexpr ULONG up_size_from_32bit(ULONG len)
{
return len + size_delta_3264 * ((len - 4) / sizeof(OBJECT_TYPE_INFORMATION32));
}
// Trying to mimic the wdm.h macro
constexpr size_t align_up_by(size_t address, size_t alignment)
{
return (address + (alignment - 1)) & ~(alignment - 1);
}
constexpr auto u32 = 8280UL;
constexpr auto u64 = 8968UL;
constexpr auto from_64 = down_size_to_32bit(u64);
constexpr auto from_32 = up_size_from_32bit(u32);
constexpr auto from_32_16_byte_aligned = (ULONG)align_up_by(from_32, 16);
int wmain()
{
wprintf(L"32 to 64 bit: %u -> %u -(16-byte-align)-> %u\n", u32, from_32, from_32_16_byte_aligned);
wprintf(L"64 to 32 bit: %u -> %u\n", u64, from_64);
return 0;
}
static_assert(sizeof(OBJECT_TYPE_INFORMATION32) == 96, "Size for 64 bit struct does not match.");
static_assert(sizeof(OBJECT_TYPE_INFORMATION64) == 104, "Size for 64 bit struct does not match.");
static_assert(u32 == from_64, "Must match (from 64 to 32 bit)");
static_assert(u64 == from_32, "Must match (from 32 to 64 bit)");
static_assert(from_32_16_byte_aligned % 16 == 0, "16 byte alignment failed");
static_assert(from_32_16_byte_aligned > from_32, "We're aligning up");
This does not mimic the computation that happens in case 2, though.