4

With the new std::async in c++11, I thought I might go about trying to implement an async version of OutputDebugString to free me of some of the performance downs that result from my usual heavy printing of every little detail through the usual OutputDebugString function.

So here is my original sync OutputDebugString implementation(which works):

static void OutputDebugStringN(const char *format, ...)
{
    char outstring[256];
    memset(outstring, 0, sizeof(outstring));

    try
    {
        va_list args = {0};
        va_start(args, format); //args = (va_list) (&format+1);

        vsprintf_s(outstring, format, args);

        va_end(args);

        OutputDebugString(outstring);
    }
    catch (...) //most likely reference val arg error (va_list doesn't support ref args)
    {
        OutputDebugString("[OutputDebugStringN] Something went wrong\n");
    }
}

and the following my very naiive attempt at an async version(which doesn't work):

static void OutputDebugStringN(const char *format, ...)
{
    auto future = std::async([]{
        char outstring[256];
        memset(outstring, 0, sizeof(outstring));
        try
        {
            va_list args = {0};
            va_start(args, format); //args = (va_list) (&format+1);

            vsprintf_s(outstring, format, args);

            va_end(args);

            OutputDebugString(outstring);
        }
        catch (...) //most likely reference val arg error (va_list doesn't support ref args)
        {
            OutputDebugString("[OutputDebugStringN] Something went wrong\n");
        }
    }); 
}

And since the above doesn't work I'm now at the point where I'm starting to think that async calling OutputDebugStringN might be better than trying to launch an async job inside the function itself as so (which works, but cumbersome):

auto dstring = std::async([]{ OutputDebugStringN("[NovelScript::ParseTokens] searched bookmark: \"%s\" does not exist\n", bookmark.c_str());} );

So here are two questions I'd like to ask:

  1. How should I be going about implementing an async version of OutputDebugString?
  2. Should I even be trying to implement an async version of OutputDebugString?

Criticism on the above code and any other comments are also much welcome.

dk123
  • 18,684
  • 20
  • 70
  • 77
  • are you trying to use the std::async? my issue with it is of course it is only in the later c++, so it isn't always available. you can implement async just by building your own producer-consumer queue, and your own thread to actually send the messages out. your use of async above is not right. it wouldn't give you the behavior you want. – thang Jan 26 '13 at 11:28
  • @thang This is just a small personal project of mine so I don't really mind that it's only a part of c++11. I'm not really that familiar though about async implementations though(as you've already seen) - how would you go about implementing this if you were given the opportunity? – dk123 Jan 26 '13 at 11:32
  • i'll write a long reply.. needs more space than is allocated in the comment – thang Jan 26 '13 at 11:34
  • 2
    Change `outstring` to `char outstring[256] = { };` and you don't need that `memset` crap. ;-] – ildjarn Jan 26 '13 at 22:55
  • @ildjam I think I'll just change things to a vector instead :) – dk123 Jan 27 '13 at 01:33
  • @dk123 : Even better! :-] – ildjarn Jan 28 '13 at 22:26
  • You might want to try [DebugView++](https://github.com/CobaltFusion/DebugViewPP) instead of Sysinternal's [DebugView](https://learn.microsoft.com/en-us/sysinternals/downloads/debugview). See [this answer](https://stackoverflow.com/a/62516566/12932497) for a performance comparison. – Christian Stadelmann Mar 29 '23 at 09:36

3 Answers3

9

I think you should have a queue for your messages instead of starting a thread with each call to your function, that way your messages will be output clean and in the right order.

So your function e.g. OutputDebugStringN(const char *format, ... ) would create the message string and then queue the string that a seperate printout thread reads from. That thread would call OutputDebugString.

Here's an example - not complete though, no error handling and print_from_queue should be modified to run until some termination condition and be a bit more friendly to the CPU.

std::mutex g_m;
std::deque<std::string> que;
std::atomic<bool> endcond = false;

void queue(std::string msg)
{
  std::lock_guard<mutex> _(g_m);
  que.push_back(msg);
}

void print_from_queue()
{
  while ( !endcond )
  {
    if ( que.size() )
    {
      std::lock_guard<mutex> _(g_m);
      std::string msg = que.front();
      que.pop_front();
      OutputDebugStringA(msg.c_str());
    }
  }
}

int debugf( const char *format,... )
{
  std::vector<char> line(256);
  va_list args;
  va_start( args, format );
  int len = vsprintf_s( &line[0], line.size(), format, args );
  va_end( args );
  queue( &line[0] );
  return len;
}

int _tmain(int argc, _TCHAR* argv[])
{
  auto thr = std::async( std::launch::async, print_from_queue );
  debugf("message1");
  debugf("message2");
...
AndersK
  • 35,813
  • 6
  • 60
  • 86
  • 2
    +1 I've implemented this precise algorithm several times over for debug logging when a serialized log was detrimental to profiling performance. I highly recc it. – WhozCraig Jan 26 '13 at 11:38
  • Thank you for the reply; would it be possible for you to show me a possible implementation? My lack of knowledge on the usage of async prevents me from trying to do this myself. – dk123 Jan 26 '13 at 11:38
  • @dk123 do you know what a semaphore and a critical section are? if so, then all you need is a queue and a writer-thread watching the semaphore for arriving messages. or a mutex and an event, etc. there are several ways to do it. – WhozCraig Jan 26 '13 at 11:39
  • @WhozCraig I'd prefer an implementation that makes use of std::async if possible for personal study and reference. Any ideas..? – dk123 Jan 26 '13 at 11:41
  • 3
    @dk123 Honestly, it isn't exactly a stellar fit for this purpose. The ideal fit for this is a long-running consumer thread that pulls in input from one or more producers, serializing the data stream to OutputDebugString. – WhozCraig Jan 26 '13 at 11:47
  • @WhozCraig, claptrap : Thank you for the comments and implementation example. I'll look further into this direction with this as a starting point. Thank you! – dk123 Jan 27 '13 at 01:30
  • offhand does the std thread-lib offer any kind of a waitable condition variable, semaphore, or event construct? If not, tis too bad, as it seems the only that that would be missing to make it truly wicked, imho. – WhozCraig Jan 27 '13 at 01:35
  • New to threads but find your example intriguing to try. How would you make it CPU friendly though ? By using Sleep() ? If not how ? If yes, how many milliseconds seems reasonable ? – Peter Apr 10 '20 at 09:21
  • @Peter i think a condition_variable would maybe be better than sleep, it could replace the "endcond". the timeout would then make it more cpu friendly – AndersK Apr 10 '20 at 19:52
  • Thank you @AndersK . I already replaced `endcond` with a `bool` variable which is set during construction of my logger singleton, and which is reset during the singleton destruction, after which the destructor waits via `WaitForSingleObject()` before destroying the thread `handle` and `critical section` (I use an older compiler, don't have all the fance c++ stuff :). However that still means high cpu use in the thread while racing through the `if ( que.size() )` loop, so I use `else Sleep(100)` to put the thread to sleep if nothing is in the queue anymore. This settles the CPU .. – Peter Apr 11 '20 at 02:21
  • .. but I wonder if there is a better mechanism to trigger the thread to start working again if something is put in the queue, and/or if there is a better or ideal time in milliseconds to put the thread to sleep ? – Peter Apr 11 '20 at 02:24
  • @Peter have you looked at std::condition_variable ? by using one you can make the thread sleep until some condition, like x number of messages in the queue and then do the processing. see https://en.cppreference.com/w/cpp/thread/condition_variable – AndersK Apr 11 '20 at 07:14
  • @AndersK I'm afraid my ancient compiler (C++ Builder 2009) doesn't come with those c++ libraries. Perhaps `Sleep()` is the best I can do using older tech ? I was also wondering why checking the shared resource `que`, `if ( que.size() )` in your example, is safe before you `lock_guard<>()`. Is there no danger in that (just wondering) ? – Peter Apr 12 '20 at 01:12
  • @AndersK sorry to bother you again. Will leave this be but very interested in having an answer to my previous Q, namely if checking que.size() is safe outside the lock ? Thx – Peter Apr 16 '20 at 06:11
  • @Peter np, you are in fact correct, size() is not thread safe, you would need to wrap it to make it thread safe. i usually use concurrent_queue and there they even call it unsafe_size – AndersK Apr 17 '20 at 04:42
4

In my opinion, debugging should be synchronous, and not asynchronous. Would you appreciate if debugger gives you exception seconds after it occurred? Would you appreciate a log-file having stale data of your program status, after your program crashed (since you made file-writing asnyc)?

Anyhow, you went ahead and made the debug-output as asynchronous. What purpose does it solve, other than just dumping data to debug-window, you know it is not recent, it doesn't reflect your/user or program's actions, it is all stale! You absolutely cannot rely on it.

Ajay
  • 18,086
  • 12
  • 59
  • 105
  • 1
    I agree, hence my second question on whether or not such an implementation would be useful at all. However after some thought, I've somewhat come to the conclusion that time delay between the log and actual events wouldn't really matter if I logged all information regrading user actions, etc. in the first place (which I'd usually do anyway for future reference) - in such case all I'd need to do is implement a logging class that doesn't interfere with the actual program logic itself to avoid performance loss (hence the current question). Any thoughts on this? – dk123 Jan 26 '13 at 11:57
  • Yes, simple thought - `OutputDebugString` would most probably run faster than the process of "delaying" it. Also, you need to ensure that your program is running under a debugger and act accordingly. It would be wasteful to send async messages which wouldn't be processed for logical outcome. `IsDebuggerPresent` sounds familiar? Function pointers, dynamic polymorphism - things you need to "disable" async messaging! – Ajay Jan 26 '13 at 12:01
  • 1
    actually this depends on the application. if you're running a high speed video processing pipeline, you do want asynchronous debug messages because stalling to do OutputDebugString clogs up the pipeline, which can screw up code behavior somewhere else. and debugging doesn't have to be synchronous real time. sometimes you want it in order. sometimes you don't care. sometimes you just have a bunch of log files. there are many constraints, and it really depends on your application. – thang Jan 26 '13 at 12:02
  • Know a simple fact - `OutputDebugString` is implemented in kernel32.dll, which is almost seamless call (I didn't say it is kernel call). But any async message would definitely require a lot for "message-passing" between threads/task, few or more of CTOR and DTOR calls. Performance wise, ODS would win over async (excluding actual ODS call!). – Ajay Jan 26 '13 at 12:15
  • Actually just because it is in kernel32 doesn't mean it's fast. If someone is hooked up to receiving the messages, it actually stalls OutputDebugString. You can demonstrate this by measuring the time it takes to OutputDebugString with debugview running. Async messages don't require message passing. Simply adding an element to a queue is minimal work... you can measure this also for yourself. Please see this article for details: http://www.softwareverify.com/blog/?p=261. OutputDebugstring is fast if you're no one is sinking to its messages, but in that case it is useless. – thang Jan 26 '13 at 17:45
  • Alright. Use it. I don't recommend it, and I wont use such thing. – Ajay Jan 26 '13 at 18:19
  • Again, there's a time and place. In certain scenarios it is necessary. In others, it adds more problems than it solves... – thang Jan 26 '13 at 18:22
3

criticism of the code above and of the C++ async function: :p

The return value of std::async is an object of type std::future. The destructor of std::future created by std::async waits until the task has been executed. So when you do:

auto future = std::async(...

or

auto dstring = std::async([]{

it creates an object of type std::future, and when you leave scope of your OutputDebugStringN, it calls the destructor of std::future, which blocks until the task gets executed.

In my opinion, this is a flaw with C++. It's somewhat idiotic (hopefully this doesn't offend anyone :p), and it totally defeats the purpose of async. In order to get the behavior that most people expect (apparently, you expected it), you have to keep a list of std::future objects, and then spend effort (and processing time) figuring out the right time to destroy individual std::future objects in your list. This is the answer to question #1 in the OP. For #2, I think you should not use std::async for this purpose on a per debug message basis. I think it creates more problems than it solves.

I don't know that there is an elegant way around this. Maybe someone else can chime in.

As far as how I would go about implementing an asynchronous version of OutputDebugString, I would just create a producer-consumer queue of strings. There have been tons of questions asked about that, and you can google producer-consumer queue for details. The producer is your main thread that is emitting the message. The consumer is a thread (or several threads) that pick out elements from the queue and calling Window's OutputDebugString.

EDIT: in case I offended any async enthusiasts, I would like to add that std::async is useful for doing parallel computation much like using GPU. I suspect it was designed for exploitation of parallel hardware. For example:

      // merge sort
      {
           auto left =  std::async(.....);  // merge_sort left
           auto right =  std::async(.....);  // merge_sort right
      }

      merge

in this way, both left and right have to be sorted before we merge. if not, wait until they're both sorted, but it allows for the opportunity to process both left and right in parallel.

this should look very familiar if you've done CUDA or any GPGPU coding...

thang
  • 3,466
  • 1
  • 19
  • 31
  • 1
    Maybe note that only `std::futures` created from `std::async` block in their destructor. – Stephan Dollberg Jan 26 '13 at 11:54
  • @bamboon, good remark. I added this in case someone tries to extrapolate. – thang Jan 26 '13 at 11:55
  • 1
    The future getting destroyed really doesn't sound natural; thanks for the post - it's helped me clarify my understanding of all this. I chose claptrap's answer for the implementation example, but your answer best explains the concepts. Thanks! – dk123 Jan 27 '13 at 01:32