5

I often need to convert BSTR strings to std::wstring. A NULL BSTR counts as an empty BSTR.

I used to do it like this:

#define CHECKNULLSTR(str) ((str) ? (str) : L"")
std::wstring wstr(CHECKNULLSTR(bstr));

It doesn't handle internal '\0' chars, but it also needs to count the characters before it can allocate enough memory, so it should be slow. I thought of this optimization, which should handle every case, doesn't truncate, and doesn't need to count:

std::wstring wstr(bstr, bstr + ::SysStringLen(bstr));

To test the impact of this change, I wrote the following tester. It shows that the optimization takes more than twice as long in most cases. The change is observable in both Debug and Release configurations, and I'm using VC++ 2013.

Hence my question, what is going on here? How can the "pair of pointers" iterator constructor be so much slower than the C-String constructor?

Complete tester

#include <windows.h>
#include <stdio.h>
#include <tchar.h>
#include <strsafe.h>
#include <iostream>

#define CHECKNULLSTR(str) ((str) ? (str) : L"")

ULONGLONG bstrAllocTest(UINT iterations = 10000)
{
    ULONGLONG totallen = 0;
    ULONGLONG start, stop, elapsed1, elapsed2;    
    BSTR bstr = ::SysAllocString( // 15 * 50 = 750 chars
                     L"01234567890123456789012345678901234567890123456789" //  1
                     L"01234567890123456789012345678901234567890123456789" //  2
                     L"01234567890123456789012345678901234567890123456789" //  3
                     L"01234567890123456789012345678901234567890123456789" //  4
                     L"01234567890123456789012345678901234567890123456789" //  5
                     L"01234567890123456789012345678901234567890123456789" //  6
                     L"01234567890123456789012345678901234567890123456789" //  7
                     L"01234567890123456789012345678901234567890123456789" //  8
                     L"01234567890123456789012345678901234567890123456789" //  9
                     L"01234567890123456789012345678901234567890123456789" // 10
                     L"01234567890123456789012345678901234567890123456789" // 11
                     L"01234567890123456789012345678901234567890123456789" // 12
                     L"01234567890123456789012345678901234567890123456789" // 13
                     L"01234567890123456789012345678901234567890123456789" // 14
                     L"01234567890123456789012345678901234567890123456789" // 15
                                );

    start = ::GetTickCount64();
    for (UINT i = 1; i <= iterations; ++i)
    {
        std::wstring wstr(CHECKNULLSTR(bstr));
        size_t len;
        ::StringCchLengthW(wstr.c_str(), STRSAFE_MAX_CCH, &len);
        totallen += len;
    }
    stop = ::GetTickCount64();
    elapsed1 = stop - start;

    start = ::GetTickCount64();
    for (UINT i = 1; i <= iterations; ++i)
    {
        std::wstring wstr(bstr, bstr + ::SysStringLen(bstr));
        size_t len;
        ::StringCchLengthW(wstr.c_str(), STRSAFE_MAX_CCH, &len);
        totallen += len;
    }
    stop = ::GetTickCount64();
    elapsed2 = stop - start;

    wprintf_s(L"Iter:\t%u\n"
              L"Elapsed (CHECKNULLSTR):\t%10llu ms\n"
              L"Elapsed (Ptr iter pair):\t%10llu ms\n"
              L"Speed difference:\t%f %%\n",
              iterations,
              elapsed1,
              elapsed2,
              (static_cast<double>(elapsed2) / elapsed1 * 100));

    ::SysFreeString(bstr);
    return totallen;
}

int wmain(int argc, char* argv[])
{
    ULONGLONG dummylen = bstrAllocTest(100 * 1000);
    wprintf_s(L"\nTotal length:\t%llu", dummylen);
    getchar();
    return 0;
}

Output on my system

Iter:   100000
Elapsed (CHECKNULLSTR):        296 ms
Elapsed (Ptr it pair):         577 ms
Speed difference:       194.932432 %

Total length:   150000000
Felix Dombek
  • 13,664
  • 17
  • 79
  • 131

1 Answers1

7

Interesting and a bit surprising indeed. The difference in performance for Visual C++ 2013 Update 4 is down to the way the two std::wstring constructors are implemented in its standard library. Generally speaking, the constructor taking a pair of iterators has to handle more cases, as those iterators are not necessarily pointers, and they can point to other data types than the string's character type (the character type just needs to be constructible from the type pointed to by the iterators). However, I was expecting the implementation to handle your case separately with optimized code.

std::wstring wstr(CHECKNULLSTR(bstr)); indeed scans the string for the end 0, then allocates, then copies the string data over in the fastest possible way using memcpy, which is implemented using assembly code.

std::wstring wstr(bstr, bstr + ::SysStringLen(bstr)); indeed avoids the scan because of ::SysStringLen (which is very fast, just reads the stored length), then allocates, but then copies the string data over using the following loop:

for (; _First != _Last; ++_First)
   append((size_type)1, (_Elem)*_First);

VC12 decides not to inline the append call (understandably so, the body is pretty big), and all this, as you can imagine, carries quite a bit of overhead compared to a blazing memcpy.


One solution is to use the std::basic_string constructor that takes a pointer and a count (also mentioned by Ben Voigt in his comment), like this:

std::wstring wstr(CHECKNULLSTR(bstr), ::SysStringLen(bstr));

I've just tested it, and it does bring the expected benefits on Visual C++ 2013 - it sometimes takes just half the time of the first version, and about 75% in the worst case (these are approximate measurements anyway).


The standard library implementation in Visual C++ 2015 CTP6 has an optimized code path for the constructor taking an iterator pair when the iterators are actually pointers to the same character type as the string to be constructed, resulting in essentially the same code as the pointer-and-count variant above. So, on this version, it doesn't matter which of these two constructor variants you use for your case - they're both faster than the version taking only a pointer.

bogdan
  • 9,229
  • 2
  • 33
  • 48
  • Note that the input iterator version can also use input iterators such as istream_iterator, so neither preallocation nor `memcpy` are possible in the most general case. It would of course make sense to optimize for random access iterators, but the Standard doesn't require it. – Ben Voigt Apr 10 '15 at 16:35
  • @BenVoigt This is the version optimized for forward iterators, which makes a call to `_Distance(_First, _Last, _Count);` first, in order to preallocate. `_Distance`, of course, uses the special version for random access iterators that just subtracts `_First` from `_Last`. So this part is all good, otherwise it would be worse - the version for input iterators just uses the `append` loop with no preallocation, as you said, but it's not the one used in this case. – bogdan Apr 10 '15 at 16:40
  • But `memcpy` cannot be used for all forward iterators, or even for all random-access iterators. On the other hand, delegation to `std::copy` would benefit from the various optimizations implemented there. – Ben Voigt Apr 10 '15 at 16:46
  • Couldn't they add a specialization for `wchar_t*,wchar_t*` that does a `memcpy` again? – Mark Ransom Apr 10 '15 at 16:52
  • @BenVoigt Yes, that's true for `memcpy` in general, my previous comment was regarding the preallocation part. It looks like VC14 does implement an optimization for this case, I'm taking a look at it right now. – bogdan Apr 10 '15 at 16:53
  • @MarkRansom: Sure. And also for `std::vector::iterator` and `const_iterator` and `std::wstring::iterator`. And probably a dozen others. Where is the tradeoff between keeping the amount of code manageable, and accelerating these paths? After all, there already is a "pointer and count" overload, which people having pointers might be expected to use. – Ben Voigt Apr 10 '15 at 16:55
  • @MarkRansom That's what I was expecting to see as well, and that's why I was surprised by the difference in performance on VC12. Fortunately, as I mentioned in my last update, the standard library in VC14 does implement this optimization. – bogdan Apr 10 '15 at 17:35
  • Thanks for this analysis! Just one question, if the count is 0, could the ptr-and-count constructor still dereference the pointer? Is `CHECKNULLSTR` necessary in this case? – Felix Dombek Apr 10 '15 at 17:56
  • 1
    @FelixDombek I've just checked VC's implementation and, as far as I can tell, it doesn't dereference the pointer if the count is 0. However, the standard's documentation for this constructor says that the pointer "points to an array of at least n elements of charT". I'm not sure if that's supposed to allow null pointers (there are no zero length arrays in C++), but I wouldn't count on it; I'd rather play it safe. Anyway, if you want to remove that check, do some proper testing first, don't just believe me. – bogdan Apr 10 '15 at 18:21