0

TL;DR

vkAcquireNextImageKHR throws std::out_of_range when certain queue families are used. Is this expected behavior? How to debug?

Detailed description

The Vulkan program I use is based on vulkan-tutorial.com. I discovered that my VkPhysicalDevice has three queue families, each flagged with VK_QUEUE_GRAPHICS_BIT and present support:

uint32_t queueFamilyCount;
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, nullptr);
std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount);
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, queueFamilies.data());

std::vector<uint32_t> graphicsQueueFamilyIndices;
std::vector<uint32_t> presentQueueFamilyIndices;
int i = 0;
for (const auto& queueFamily : queueFamilies)
{
  if (queueFamily.queueFlags & VK_QUEUE_GRAPHICS_BIT)
  {
    graphicsQueueFamilyIndices.push_back(i);
  }

  VkBool32 presentSupport = false;           
  vkGetPhysicalDeviceSurfaceSupportKHR(         
      device,
      i,          
      surface,
      &presentSupport
    );
  if (presentSupport)
  {
    presentQueueFamilyIndices.push_back(i);
  }

  ++i;
}

// graphicsQueueFamilyIndices = {0, 1, 2}
// presentQueueFamilyIndices = {0, 1, 2}

These are later used when creating the logical device, the swapchain (the queue families all have present capability) and the command pool. Later the program calls

vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, semaphore, VK_NULL_HANDLE, &imageIndex);

But using any other than 0 causes this API call to throw an uncaught std::out_of_range (output is that of lldb): But using any combination of present and graphics queue indices of the following causes this API call to throw an uncaught std::out_of_range: (1, 1), (1, 2), (2, 1), (2, 2).

lldb output is as follows:

2019-12-01 11:36:35.599882+0100 main[22130:167876] flock failed to lock maps file: errno = 35
2019-12-01 11:36:35.600165+0100 main[22130:167876] flock failed to lock maps file: errno = 35
libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: Index out of range
Process 22130 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff675c949a libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff675c949a <+10>: jae    0x7fff675c94a4            ; <+20>
    0x7fff675c949c <+12>: movq   %rax, %rdi
    0x7fff675c949f <+15>: jmp    0x7fff675c33b7            ; cerror_nocancel
    0x7fff675c94a4 <+20>: retq
Target 0: (main) stopped.

The same error is caused when using an indices that doesn't even refer to a queue, like 123. I'm using the VK_LAYER_KHRONOS_validation layer, which doesn't utter any complaint.

Questions

(1) Is this the expected behavior for passing the wrong queue family index to Vk?

(2) Are there validation layers that are capable of catching this error and making it more verbose?

(3) Why do these choices of queue families cause this error?

Details

Using queue family indices (1, 1) for graphics and present queue families during logical device creation while using index 0 for everything else already causes vkAcquireNextImage to raise the error. Of course, VK_LAYER_KHRONOS_validation raises the following warning upon command pool creation:

Validation layer: vkCreateCommandPool: pCreateInfo->queueFamilyIndex (= 0) is not one of the queue families given via VkDeviceQueueCreateInfo structures when the device was created. The Vulkan spec states: pCreateInfo::queueFamilyIndex must be the index of a queue family available in the logical device device. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkCreateCommandPool-queueFamilyIndex-01937)

I'm using MoltenVK (from the Vulkan SDK, version 1.1.126.0) on macOS Catalina 10.15.1.

Workarounds

  • Using version 1.1.121.1 of the SDK prevents the throw from occurring.

  • Creating a device queue family with index 0 alongside any other device queues one might require prevents the throw from occurring.

Issue on GitHub

This has now been raised as issue on GitHub [here].

mkl
  • 635
  • 1
  • 6
  • 16
  • "*But using any other than 0*" Where are you putting these values? – Nicol Bolas Dec 02 '19 at 14:50
  • Do you mean in what API calls do I use these queue indices? That would be (1) `vkGetDeviceQueue`, once for graphics, once for present; (2) in `createInfo.pQueueFamilyIndices`, where `createInfo` is passed to `vkCreateSwapchainKHR` as `pCreateInfo` parameter; (3) in `poolInfo.queueFamilyIndex`, where `poolInfo` is lated passed to `vkCreateCommandPool` as `pCreateInfo` parameter. – mkl Dec 02 '19 at 22:19
  • And which one is the one that provokes the crash? – Nicol Bolas Dec 03 '19 at 02:27
  • I've gone through the permutations. First of all, I've given slightly wrong information: The choice of the present queue family index does matter. Sorry about that! Second of all (see "Details" in the questions), it seems like the logical device creation is what provokes `vkAcquireNextImage` to throw. – mkl Dec 03 '19 at 11:09
  • 2
    Vulkan is not even a C++ API. It cannot throw `std::` anything. Your backtrace looks weird; I need to see function names and source of the exception. I am not sure what "com.apple" means, is this MoltenVK? – krOoze Dec 03 '19 at 12:58
  • Yes, I'm using MoltenVK (Vulkan SDK 1.1.126.0). – mkl Dec 03 '19 at 13:38
  • And, yes, the backtrace is incredibly unhelpful, but rebuilding the SDK with debug symbols enabled is out of the question right now, I'm afraid. – mkl Dec 03 '19 at 13:46
  • @mkl: "*Using queue family indices (1, 1) for graphics and present queue families during logical device creation while using index 0 for everything*" That doesn't make sense. You shouldn't just pick random queue indices for stuff; you should use the queue that is *appropriate* for the task. – Nicol Bolas Dec 03 '19 at 14:35
  • I know that picking random queue indices will most likely not result in a working program (hence the message from the validation layer). When I used (1, 1) only for some calls, I was trying to isolate which call involving the queue indices provoked the crash - in order to answer your question from earlier! I would say that that I singled out the creation of the logical device, but I'm not sure if I'm interpreting this correctly. What should I have done instead? – mkl Dec 03 '19 at 15:08
  • 1
    Could be related to https://github.com/KhronosGroup/MoltenVK/issues/779. – krOoze Dec 05 '19 at 01:01
  • 1
    I browsed the source a bit, and MoltenVK seems to always use queue family 0 implicitly for acquire. What happens if you use fence and no semaphore? What happens if you create a queue family 0 at `vkCreateDevice` alongside the others (but otherwisely not use it)? – krOoze Dec 05 '19 at 01:52
  • Thank you! Creating a queue family with index `0` solves the problem, and so does using a fence instead of a semaphore in `vkAcquireNextImageKHR`. I guess then my issue is related to [this](http://github.com/KhronosGroup/MoltenVK/issues/779), although I wonder why they didn't get an `std::out_of_range`. I know very little about Vulkan, but this seems like a bug, right? Should I raise this as an issue on GitHub? – mkl Dec 05 '19 at 11:34
  • @mkl Yea in that case it would be a bug. I am just gonna link your Issue in the answer. – krOoze Dec 08 '19 at 18:32

1 Answers1

2

That seems to be a bug in MoltenVK. Inspection of the MoltenVK source indicates that it always implicitly uses queue 0 of queue family 0 for vkAcquireNextImage. The fact that you have no problems if you create that queue explicitly, or if you use just a Fence tells me MoltenVk probably forgets to initialize that implicit queue properly for itself.

The GitHub Issue is filed at KhronosGroup/MoltenVK#791.

krOoze
  • 12,301
  • 1
  • 20
  • 34
  • I've looked at the source of MoltenVK, but I couldn't find what you're referencing. After `MVKSwapchain::acquireNextImageKHR`, I've lost track. What part of the source code are you referring to? – mkl Dec 08 '19 at 22:26
  • 1
    https://github.com/KhronosGroup/MoltenVK/blob/f8520ba3e70409cd661794975d5f6dc86266423d/MoltenVK/MoltenVK/GPUObjects/MVKSwapchain.mm#L174-L176 and `getQueue()` is at https://github.com/KhronosGroup/MoltenVK/blob/f8520ba3e70409cd661794975d5f6dc86266423d/MoltenVK/MoltenVK/GPUObjects/MVKDevice.h#L404 where you can see that the parameters are defaulted to `0`. – krOoze Dec 09 '19 at 00:03
  • The issue is resolved in [PR #799](https://github.com/KhronosGroup/MoltenVK/pull/799). – mkl Dec 19 '19 at 19:48