0

I'm trying to adapt this tutorial to use my own neural net and images. I can do that on my CPU, but what I cannot do either with the unchanged tutorial, or my adaptation of it, is use my GPU. According to system information, I have an "NVIDIA Quadro P2200", not that I need to specify this anywhere as far as I can tell. Instead, it seems all I need do is replace:

LearningModelDeviceKind deviceKind = LearningModelDeviceKind::Default;

with:

LearningModelDeviceKind deviceKind = LearningModelDeviceKind::DirectX;

When I do this, I get an exception in:

auto results = session.Evaluate(binding, L"RunId");

After constructing the second parameter, this drops into:

template <typename D> WINRT_IMPL_AUTO(Windows::AI::MachineLearning::LearningModelEvaluationResult) consume_Windows_AI_MachineLearning_ILearningModelSession<D>::Evaluate(Windows::AI::MachineLearning::LearningModelBinding const& bindings, param::hstring const& correlationId) const
{
    void* result{};
    check_hresult(WINRT_IMPL_SHIM(Windows::AI::MachineLearning::ILearningModelSession)->Evaluate(*(void**)(&bindings), *(void**)(&correlationId), &result));
    return Windows::AI::MachineLearning::LearningModelEvaluationResult{ result, take_ownership_from_abi };
}

A winrt::hresult_error is thrown immediately upon stepping into the check_hresult(...) line. I think this means bindings is somehow invalid... but (a) I'm not sure about that and (b) I have no idea what to do to make it valid. Help?

EDIT: I can now get the MS sample working, but not my adaptation. When I view the MS sample .onnx file using Netron, the input and output nodes have reasonable names, and the tensor sizes reported are also reasonable. On the model I am trying to use, the input & output nodes both have ":0" as the last part of their name, and the tensor sizes have one "unknown" size e.g. input size is reported as "unk_123 x 3 x 224 x 224". Do either of these create any incompatibility? The network is supplied to me, so I'd like to understand if either require change before asking for it...

omatai
  • 3,448
  • 5
  • 47
  • 74
  • you could try to break it up in multiple lines, which will make it easier to find the bug. –  Nov 09 '20 at 21:23
  • I have since managed to find a slip-up in the implementation of the original example, and got GPU working for it. But... the overheads of using the GPU outweigh the benefits of the speed of the GPU! :-( This has caused me to lose all interest in finding the source of the error in my adaptation - it is a network that relies on conversion from a Tensorflow Keras 2.3 model, and that conversion seems to be at least new, if not still experimental. I might just wait a few weeks then try again, and in the meantime explore paths where the GPU makes a real difference. – omatai Nov 11 '20 at 01:59
  • can you include the debugger output as well? There are frequently helpful messages in the debug output when getting exceptions like this. – Brian Martin Jan 06 '21 at 22:44
  • It is trying to throw some kind of hresult, but I can't catch it, and I can't get the debugger to access the memory and tell me what the value is in either debug or release mode. But as noted... the indications are not good, and I am pursuing DirectML now instead of WinML. – omatai Jan 08 '21 at 02:25

1 Answers1

0

It all works as intended. Having tripped up several times trying to adapt Windows ML code to my requirements, my strong advice is:

  • double-check everything - use the debugger to prove that variables contain what you think they do at every step of the set up.

For example, in response to the EDIT section, the issue was copied/pasted/edited code that changed the output shape from 1 x 1000 x 1 x 1 (pasted) to 1 x 10 x 1 x 1 (edited) when it needed to be 1 x 10. That was detected by following my own advice above :-)

I can confirm that setting deviceKind = LearningModelDeviceKind::DirectX is what invokes the GPU, but that you may not get any noticeable speed improvement from doing so.

omatai
  • 3,448
  • 5
  • 47
  • 74