I want to be able to write something close to:
std::cout << "Hello" << my_world_string << ", " << std::setprecision(5) << my_double << '\n';
in CUDA device-side code, for debugging templated functions - and for this kind of line of code to result in a single, unbroken, output line (i.e. the equivalent of a single CUDA printf()
call - which typically doesn't get mangled with output from other threads).
Of course, that's not possible since there are no files or file descriptors in device-side code, nor is any of the std::ostream
code usable in device-side code. Essentially what we have to work with is CUDA's hardware+software hack enabling printf()
s. But it is obviously possible to get something like:
stream << "Hello" << my_world_string << ", " << foo::setprecision(5) << my_double << '\n';
stream.flush();
or:
stream << "Hello" << my_world_string << ", " << foo::setprecision(5) << my_double << '\n';
printf("%s", stream.str());
My question is: What should I implement which would allow me to write code as close to the above as possible, minimizing effort / amount of code to write?
Notes:
- I used the identifier
stream
but it doesn't have to be a stream. Nor does the code need to look just like I laid it out. The point is for me to be able to have printing code in a templated device function. - All code will be written in C++11.
- Code may assume compilation is performed either with C++11 or a later version of the standard.
- I can use existing FOSS code, but only if its license is permissive, e.g. 3-BSD, CC-BY-SA, MIT - but not GPL.