The ugly part is going to be receiving the events. Directly coding to the COM interface of MSHTML in C++ to attach logic to an HTML GUI is going to be pretty ugly if you do it "raw". You'll probably want a thin-ish layer of library code to sit between your application logic and HSHTML, to hide the COM-related plumming.
Ultimately this is a reinvention of things like Firefox's XUL - see http://en.wikipedia.org/wiki/XUL. You may find that more ready for use in this way. You'd be hosting the Gecko engine instead of MSHTML.
Or alternatively you could use WPF, which is again very similar. Given that you're on Windows (as you're happy with a dependency on MSHTML), you could write the GUI stuff in C# and bind it to the C++ code by exposing that with C++/CLI.