20

I'm really confused about passing strings from VBA to C++. Here's the VBA code:

Private Declare Sub passBSTRVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passBSTRRef Lib "foo.dll" (ByRef s As String)
Private Declare Sub passByNarrowVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passByNarrowRef Lib "foo.dll" (ByRef s As String)
Private Declare Sub passByWideVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passByWideRef Lib "foo.dll" (ByRef s As String)

Sub foobar()
    Dim s As String, str As String
    str = "Hello There, World!"

    s = str
    Call passByBSTRVal(s)
    s = str
    Call passByBSTRRef(s)
    s = str
    Call passByNarrowVal(s)
    s = str
    Call passByNarrowRef(s)
    s = str
    Call passByWideVal(s)
    s = str
    Call passByWideRef(s)
End Sub

And the C++ DLL code:

void __stdcall passByBSTRVal( BSTR s )
{
    MessageBox(NULL, s, L"Pass BSTR by value", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByBSTRRef( BSTR *s )
{
    MessageBox(NULL, *s, L"Pass BSTR by ref", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByNarrowVal( LPCSTR s )
{
    USES_CONVERSION;
    MessageBox(NULL, A2W(s), L"Pass by Narrow Val", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByNarrowRef( LPCSTR* s )
{
    USES_CONVERSION;
    MessageBox(NULL, A2W(*s), L"Pass by Narrow Ref", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByWideVal( LPCWSTR s )
{
    MessageBox(NULL, s, L"Pass by Wide Val", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByWideRef( LPCWSTR* s )
{
    MessageBox(NULL, *s, L"Pass by Wide Ref", MB_OK | MB_ICONINFORMATION);
}

My expectation was that the first two calls to passByBSTRVal and passByBSTRRef would work. Why? Because VBA strings are COM BSTR objects. However, while stepping through the C++ code, the value of s for both of these functions was garbage (a bunch of kanji). Additionally, the displayed message box was (the same). I'm really surprised the first two functions didn't work.

My next expectation was for the second two calls to passByNarrowVal and passByNarrowRef to not work because a BSTR is defined as "typedef OLECHAR *BSTR", and an OLECHAR is a wide character type, while LPCSTR is a narrow character type. However, contrary to my expectation, these two functions actually did work. When I stepped through the C++ code, the parameter s was exactly what i was expecting it to be. My expectation was wrong again.

Lastly, my expectation for the final 2 functions (pass by wide val and ref) was that they would work, since an OLECHAR is a string of wide characters, so a LPCWSTR should be able to point to a BSTR. But as with case #1 (I guess these two cases are identical) my expectation was wrong. The parameter s was made up of garbage characters (and the MessageBox displayed the same garbage characters.)

Why was my intuition completely wrong? Can someone please explain what I'm not understanding here?

Zoe
  • 27,060
  • 21
  • 118
  • 148
Mathematician
  • 380
  • 2
  • 9
  • 1
    If you are genuinely trying to pass a `BSTR` or `LPCWSTR`, type it as `ByVal param As Long` and pass `StrPtr(variableName)` to pass the string in - 32bit only. – Ben Sep 12 '16 at 17:36

5 Answers5

10

Here are some old reference articles, it's worth reading because it explains the root causes of all our problems:

To sum up:

  • VBA internal storage is BSTR with unicode characters in it.
  • VBA also uses BSTR for talking with the external world, but you don't have to use BSTR if you don't want to because from C/C++, you may choose to use only the pointer part of the BSTR (a BSTR is a LPWSTR, an LPWSTR is not a BSTR).
  • The content of the BSTR that VBA uses to communicate outside its world is not unicode but ANSI (VBA is still living in the 90s and thinks that, regarding the String data type, the outside world is always ANSI, ASCIIZ, CodePage, etc. ). So, even if it still uses a BSTR, that BSTR contains the ANSI equivalent of the internal Unicode storage, modulo the current locale (BSTR is like an envelope that can contain anything, including ANSI, including zero characters anywhere, provided the length matches the data).

So when you use use Declare with argument of type String, the final binary layout will always match C's ANSI 'char *' (or LPSTR in windows macro parlance). Officially, you're still supposed to use VARIANTs if you want to pass full unicode string over interop barriers (read the links for more on this).

But, not all is lost, as VBA (not VB) has been a bit improved over the years, mainly to support Office 64-bit versions.

The LongPtr data type has been introduced. It's a type that will be a signed 32 bit integer on a 32-bit system and a signed 64 bit integer on a 64-bit system.

Note it's the exact equivalent of .NET's IntPtr (VBA also still thinks a Long is 32-bit and an Integer is 16-bit, while .NET uses Long for 64-bit and Int for 32-bit...).

Now, LongPtr would be useless w/o the help of VB's all-time undocumented function StrPtr that takes a string and returns a LongPtr. It's undocumented because officially VB doesn't know what a pointer is (actually, be cautious as this can crash your program at runtime if not used properly).

So, let's suppose this C code:

  STDAPI ToUpperLPWSTR(LPCWSTR in, LPWSTR out, int cch)
  {
    // unicode version
    LCMapStringW(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, lstrlenW(in), out, cch);
    return S_OK;
  }

  STDAPI ToUpperBSTR(BSTR in, BSTR out, int cch)
  {
    // unicode version
    // note the usage SysStringLen here. I can do it because it's a BSTR
    // and it's slightly faster than calling lstrlen...
    LCMapStringW(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, SysStringLen(in), out, cch);
    return S_OK;
  }

  STDAPI ToUpperLPSTR(LPCSTR in, LPSTR out, int cch)
  {
    // ansi version
    LCMapStringA(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, lstrlenA(in), out, cch);
    return S_OK;
  }

Then you can call it with these VBA declares (note this code is 32 and 64-bit compatible):

  Private Declare PtrSafe Function ToUpperLPWSTR Lib "foo.dll" (ByVal ins As LongPtr, ByVal out As LongPtr, ByVal cch As Long) As Long
  Private Declare PtrSafe Function ToUpperBSTR Lib "foo.dll" (ByVal ins As LongPtr, ByVal out As LongPtr, ByVal cch As Long) As Long
  Private Declare PtrSafe Function ToUpperLPSTR Lib "foo.dll" (ByVal ins As String, ByVal out As String, ByVal cch As Long) As Long

  Sub Button1_Click()

      Dim result As String
      result = String(256, 0)

      // note I use a special character 'é' to make sure it works
      // I can't use any unicode character because VBA's IDE has not been updated and does not suppport the
      // whole unicode range (internally it does, but you'll have to store the texts elsewhere, and load it as an opaque thing w/o the IDE involved)

      ToUpperLPWSTR StrPtr("héllo world"), StrPtr(result), 256
      MsgBox result
      ToUpperBSTR StrPtr("héllo world"), StrPtr(result), 256
      MsgBox result
      ToUpperLPSTR "héllo world", result, 256
      MsgBox result
  End Sub

They all work, however

  • the ToUpperLPSTR is an ANSI fonction, so it will not support the unicode range that most people use nowadays. It works for me because the special non ASCII 'é' character coded in the IDE will find a correspondance when I run it in my machine with my ANSI codepage. But it may not work depending on where it runs. With unicode, you don't have that kind of problems.
  • the ToUpperBSTR is kinda specialized for VBA (COM automation) clients. If this function is called from a C/C++ client, the C/C++ coder will have to create a BSTR to use it, so it may look funny and add more work. Note however it will support strings that contains the zero character in it, thanks to the way BSTR work. It can sometimes be useful for example to pass array of bytes or special strings.
Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
  • Very nice answer! –  Apr 19 '17 at 07:09
  • Ok, so I added a couple of includes WTypes.h & OleAuto.h and a DEF file to make compile and then it works as written so thanks for a great answer. It seems there are two sides to this argument, there are advocates of Declare Function and there are fans of IDL. I set bounty hoping a full working IDL solution, in the end I gave that working solution. I'm awarding you bounty as the advocate of the Declare Function side of the debate. Thanks for making this a "Goto" resource. – S Meaden Apr 20 '17 at 10:27
  • @SMeaden - thanks. I actually answered the question title :-) I can try to help, but I'm not sure what do you expect as an "IDL solution"? IDL/TLB was originally targeted at COM objects (coclasses, interfaces, etc.) and Declare was to interop with C linkage-type DLLs. – Simon Mourier Apr 20 '17 at 11:16
  • @Simon: that's fine. I have done the IDL solution (see my answer), which was raised theoretically by Ben. I know IDL and ATL is normally for CoClass etc, Ben raised a tantalising alternative of functions in global scope via IDL and I wanted to see it. It's done now. Thanks. – S Meaden Apr 20 '17 at 14:08
6

This form of external function call exists to be compatible with earlier versions of Visual Basic, and inherits their semantics. In particular, VB3 ran on 16-bit windows and dealt only with ANSI (i.e. MBCS) strings.

The Declare syntax has the same restriction. VBA converts your string on the assumption that it is converting it from UTF-16 to ASCII. This allows code written in VB3 to work unchanged in VB4, VB5 and VB6.

So for example "AZ" begins as \u0041\u005A, is converted to ANSI and becomes \x41\x5A which is reinterpreted as \u5A41 which is "婁".

(With VB4, Microsoft merged WordBasic, Excel Basic and Visual basic into a single language, VBA.)

The "new" way to call functions from VBA, is to create a type library for the external functions you need to use, using MIDL, and add it as a reference to the project. Type libraries can describe the exact signature of the function, (e.g. BSTR, LPCSTR, LPCWSTR, [out]BSTR*, etc.) In particular it is not necessary to wrap the functions in a COM object to call them from VBA (though it is if you wish to call them from VBScript).

Alternatively you can't be bothered to fire up midl for a single function, you can use the VarPtr/StrPtr/CopyMemory hack. This is pretty much equivalent to PEEK and POKE.

Ben
  • 34,935
  • 6
  • 74
  • 113
2

BIG HUGE NOTE: I'm not a programmer, I just really enjoy programming, so please be kind to me. I want to improve, so suggestions and comments from people more skilled than me (basically, everyone) are VERY welcomed!

Ben, if you're reading this, I think you opened my eyes to what's happening. MIDL sounds like the proper way of doing this, and I intend on learning it, but this seemed like a good learning opportunity, and I never let those pass me by!

I think what's happening is that narrow characters are getting marshalled into a wide character storage. For example, the string "hello" stored with narrow characters looks like:

|h |e |l |l |o |\0 |

and stored with wide characters, looks like:

|h   |e   |l   |l   |o   |\0   |

But when you pass a string from VBA to C++, something really strange happens. You get narrow characters marshalled into a wide character, like this:

|h e |l l |o \0 |    |    |    |

This is why using LPCSTR / LPCSTR* works. Yes, BSTR uses a string of wchar_t, but this marshalling makes it look like a string of char. Accessing with char* alternately points to the first and second characters in each half of the wchar_t (h, then e. l, then l. o, then \0). Even though the pointer arithmetic for char* and wchar_t* is different, it works because of the funny way the characters are marshalled. In fact, we're passed a pointer to the data string, but if you wanted to access the length of the BSTR, 4 bytes before the data string, you can play games with pointer arithmetic to get where you want to go. Assuming the BSTR is passed in as LPCSTR s,

char* ptrToChar;      // 1 byte
wchar_t* ptrToWChar;  // 2 bytes
int* ptrToInt;        // 4 bytes
size_t strlen;

ptrToChar = (char *) s;
strlen = ptrToChar[-4];

ptrToWChar = (wchar_t *) s;
strlen = ptrToWChar[-2];

ptrToInt = (int *) s;
strlen = ptrToInt[-1];

Of course, if the string got passed in as LPCSTR* s, then of course you need to dereference s first by accessing via something like:

ptrToChar = (char *)(*s);

and so on.

If one wants to use LPCWSTR or BSTR to receive the VBA string, you have to dance around this marshalling. So for example, to create a C++ DLL that converts a VBA string to uppercase, I did the following:

BSTR __stdcall pUpper( LPCWSTR* s )
{
    // Get String Length (see previous discussion)
    int strlen = (*s)[-2];

    // Allocate space for the new string (+1 for the NUL character).
    char *dest = new char[strlen + 1];

    // Accessing the *LPCWSTR s using a (char *) changes what we mean by ptr arithmetic,
    // e.g. p[1] hops forward 1 byte.  s[1] hops forward 2 bytes.
    char *p = (char *)(*s);

    // Copy the string data
    for( int i = 0; i < strlen; ++i )
        dest[i] = toupper(p[i]);

    // And we're done!
    dest[strlen] = '\0';

    // Create a new BSTR using our mallocated string.
    BSTR bstr = SysAllocStringByteLen(dest, strlen);

    // dest needs to be garbage collected by us.  COM will take care of bstr.
    delete dest;
    return bstr;
}

As far as I can tell, receiving the BSTR as a BSTR is equivalent to receiving it as a LPCWSTR, and receiving it as a BSTR* is equivalent to receiving it as a LPCWSTR*.

OK, I am 100% certain there are a ton of mistakes here, but I believe the underlying ideas are correct. If there are mistakes or even better ways of thinking of something, I will gladly accept corrections / explanations, and fix them for Google, posterity, and future programmers.

It sounds like the BEST way to do this is with Ben's MIDL suggestion (and maybe MIDL will make Safearrays and Variants less complicated?), and after I hit enter, I'm going to start learning that method. But this method works too and was an excellent learning opportunity for me.

Mathematician
  • 380
  • 2
  • 9
2

Ok, so I know I set bounty to draw out a fuller response on the IDL idea but I have had a go at this myself. So I opened a ATL project, changed the idl to the following

// IDLForModules.idl : IDL source for IDLForModules
//

// This file will be processed by the MIDL tool to
// produce the type library (IDLForModules.tlb) and marshalling code.

import "oaidl.idl";
import "ocidl.idl";

[
    helpstring("Idl For Modules"),
    uuid(EA8C8803-2E90-45B1-8B87-2674A9E41DF1),
    version(1.0),
]
library IDLForModulesLib
{
    importlib("stdole2.tlb");

    [
        /* dllname attribute https://msdn.microsoft.com/en-us/library/windows/desktop/aa367099(v=vs.85).aspx */
        dllname("IdlForModules.dll"),
        uuid(4C1884B3-9C24-4B4E-BDF8-C6B2E0D8B695)
    ]
    module Math{
        /* entry attribute https://msdn.microsoft.com/en-us/library/windows/desktop/aa366815(v=vs.85).aspx */
        [entry(656)] /* map function by entry point ordinal */
        Long _stdcall Abs([in] Long Number);
    }
    module Strings{
        [entry("pUpper")] /* map function by entry point name */
        BSTR _stdcall Upper([in] BSTR Number);
    }
};

Then in the main cpp file I added

#include <string>
#include <algorithm>

INT32 __stdcall _MyAbs(INT32 Number) {
    return abs(Number);
}

BSTR __stdcall pUpper(BSTR sBstr)
{
    // Get the BSTR into the wonderful world of std::wstrings immediately
    std::wstring sStd(sBstr);

    // Do some "Mordern C++" iterator style op on the string
    std::transform(sStd.begin(), sStd.end(), sStd.begin(), ::toupper);

    // Dig out the char* and pass to create a return BSTR
    return SysAllocString(sStd.c_str());
}

And in the DEF file I edited it to be

; MidlForModules.def : Declares the module parameters.

LIBRARY

EXPORTS
    DllCanUnloadNow     PRIVATE
    DllGetClassObject   PRIVATE
    DllRegisterServer   PRIVATE
    DllUnregisterServer PRIVATE
    DllInstall      PRIVATE
    _MyAbs @656
    pUpper

In a macro-enmabled workbook called TestClient.xlsm placed in same directory as the Debug output Dll I write the following in the ThisWorkbook module

Option Explicit

Private Declare Function LoadLibrary Lib "kernel32" Alias "LoadLibraryA" (ByVal lpLibFileName As String) As Long

Private Sub Workbook_Open()
    '* next line establishes relative position of Dll
    Debug.Assert Dir(ThisWorkbook.Path & "\IDLForModules.dll") = "IDLForModules.dll"

    '* next line loads the Dll so we can avoid very long Lib "c:\foo\bar\baz\barry.dll"
    LoadLibrary ThisWorkbook.Path & "\IDLForModules.dll"

    '* next go to  Tools References are check "Idl For Modules"
    '* "Idl For Modules" Iis set in the IDL with helpstring("Idl For Modules")

End Sub

I then add a Tools Reference to the newly created Type library and now I can complete by adding a standard module and adding the following

Option Explicit

Sub TestAbs()
    Debug.Print IDLForModulesLib.Math.Abs(-5)
End Sub

Sub TestUpper()
    Debug.Print IDLForModulesLib.Strings.Upper("foobar")
End Sub

This works for me on Windows 8.1 Professional 64 bit, VS2013, Excel 15. Fuller instructions for C++ newbies can found here Throw away Declare Function with IDL for Modules.

S Meaden
  • 8,050
  • 3
  • 34
  • 65
  • 1
    IMHO idl is most of the time overkill if it's just to define an interop layer. Note you should use the idl `retval`attribute, like this: HRESULT pUpper([in] BSTR number, [out, retval] BSTR * ret) for all your functions. Returning non C standard types from a function is not recommended, and VB(A) understands the retval attribute perfectly, as all automation client should – Simon Mourier Apr 15 '17 at 09:10
1

This is just an illustration to Simon's answer. It shows how to call a native DLL that has parameters of type LPWSTR. As a simple example, I use GetWindowsDirectoryW. As Simon pointed out, always use the "W" version of native DLLs.

Declare PtrSafe Function GetWindowsDirectoryW Lib "kernel32" _ 
   (ByVal lpBuffer As LongPtr, ByVal nSize As Long) As Long

Sub TestGetWindowsDirectoryW()
  Dim WindowsDir As String
  WindowsDir = Space$(256)
  GetWindowsDirectoryW StrPtr(WindowsDir), 256
  MsgBox WindowsDir
End Sub
  • Thanks, this actually helped me a lot. I was allocating space for the string using String() function which didn't work. as soon as I've started using Space() it started to work – Andrejs Gasilovs Feb 08 '21 at 13:42