1

I've ported to C++ (well, I guess mostly C) this Java 3-Way radix quicksort implementation (page 27):

//Java code from the linked Princeton PDF, page 27...
private static void quicksortX(String a[], int lo, int hi, int d)
{
    if (hi - lo <= 0) return;
    int i = lo-1, j = hi;
    int p = lo-1, q = hi;
    char v = a[hi].charAt(d);
    while (i < j)
    {
        while (a[++i].charAt(d) < v) if (i == hi) break;
        while (v < a[--j].charAt(d)) if (j == lo) break;
        if (i > j) break;
        exch(a, i, j);
        if (a[i].charAt(d) == v) exch(a, ++p, i);
        if (a[j].charAt(d) == v) exch(a, j, --q);
    }
    if (p == q)
    {
        if (v != '\0') quicksortX(a, lo, hi, d+1);
        return;
    }
    if (a[i].charAt(d) < v) i++;
    for (int k = lo; k <= p; k++) exch(a, k, j--);
    for (int k = hi; k >= q; k--) exch(a, k, i++);
    quicksortX(a, lo, j, d);
    if ((i == hi) && (a[i].charAt(d) == v)) i++;
    if (v != '\0') quicksortX(a, j+1, i-1, d+1);
    quicksortX(a, i, hi, d);
}

I'm not a C++ programmer but I studied C in the 1980s and am obviously rusty.

I managed to use MS Visual Studio to hack together this port of the above code to make a C/C++ dll to be called from Excel VBA:

void swapPointers(long **a, long **b) {
    long *t = *a;
    *a = *b;
    *b = t;
}

long int __stdcall  QuicksortX(LPSAFEARRAY FAR *psaBSTRs, long lo, long hi, long d = 0) 
{

    if (hi - lo <= 0) return 1;
    long i = lo-1, j = hi;
    long p = lo-1, q = hi;
    
    if ((*psaBSTRs)->cDims > 0) {
        long lb = (*psaBSTRs)->rgsabound[0].lLbound;
        long ub = lb + (*psaBSTRs)->rgsabound[0].cElements - 1;
        if (lo < lb || lo > ub || lo > hi) {return -2;}
        if (hi < lb || hi > ub || hi < lo) {return -3;}
    } else { 
        return -1;
    }

    BSTR *a = (BSTR*)(*psaBSTRs)->pvData;   
    short v = LPSTR(a[hi])[d];

    while (i < j)
    {
        while (LPSTR(a[++i])[d] < v) if (i == hi) break;
        while (v < LPSTR(a[--j])[d]) if (j == lo) break;
        if (i > j) break;
        swapPointers((long**)a[i], (long**)a[j]); 
        if (LPSTR(a[i])[d] == v) swapPointers((long**)a[++p], (long**)a[i]);
        if (LPSTR(a[j])[d] == v) swapPointers((long**)a[j], (long**)a[--q]);
    }

    if (p == q) {
        if (v != 0) QuicksortX(psaBSTRs, lo, hi, d+1);
        return 0;
    }

    if (LPSTR(a[i])[d] < v) i++;
    for (int k = lo; k <= p; k++) swapPointers((long**)a[k], (long**)a[j--]);
    for (int k = hi; k >= q; k--) swapPointers((long**)a[k], (long**)a[i++]);

    QuicksortX(psaBSTRs, lo, j, d);
    if ((i == hi) && (LPSTR(a[i])[d] == v)) i++;
    if (v != 0) QuicksortX(psaBSTRs, j+1, i-1, d+1);
    QuicksortX(psaBSTRs, i, hi, d);

}

I call the function in the DLL from Excel VBA like so:

Public Declare Function QuicksortX Lib "StringArraySort" (StringArray$(), Optional ByVal Lo&, Optional ByVal Hi&, Optional ByVal d& = 0) As Long

Sub Test_1()

    Dim ret&, a() As String
    
    ReDim a(0 To 9)
    
    a(0) = "Riverside"
    a(1) = "Irvine"
    a(2) = "Capital"
    a(3) = "Kona"
    a(4) = "Mayberry"
    a(5) = "Winterhaven"
    a(6) = "Stillwater"
    a(7) = "Dallas"
    a(8) = "Roanoke"
    a(9) = "Arbor"
    
    ret = QuicksortX(a, LBound(a), UBound(a))
    
    Stop

End Sub

However, the elements get scrambled. After calling the DLL the array looks like this:

a(0) = Arborside
a(1) = Capine
a(2) = Dalltal
a(3) = Irvi
a(4) = Konaerry
a(5) = Mayberhaven
a(6) = Rivelwater
a(7) = Roanas
a(8) = Stiloke
a(9) = Wintr

It looks like the left four characters of each element get sorted, but the rest of the characters remain unsorted.

Can you please help me fix the port so that it works correctly?

  • Where is the implementation of the `exch` function? – Dmitry Kuzminov Sep 21 '20 at 22:00
  • 1
    Your `void swapPointers(long **a, long **b) {` swaps `long`'s, which are 4-byte long (or four characters) – Vlad Feinstein Sep 21 '20 at 22:04
  • @VladFeinstein Thanks that may be the issue. Can you please recommend a way to swap the pointers to the BSTR elements so that each element is swapped in its entirety? I want to swap the pointers for speed sake. –  Sep 21 '20 at 22:08
  • @DmitryKuzminov The `exch` function is not included in the Princeton PDF. I assume it just swaps array elements. –  Sep 21 '20 at 22:11
  • 1
    Are those wide characters? If they are, then all of that casting to `LPSTR` is incorrect. As a matter of fact, remove *all* of those casts. Once you do that, read the errors that the compiler generates. Don't try to cover them up by using C-style casts, unless you know what you're doing. – PaulMcKenzie Sep 21 '20 at 22:14
  • @PaulMcKenzie Excel VBA stores strings (OLE BSTRs) as 2-byte characters. I think it uses even more for some locales. My first attempt here was to get it to work with the normal 2-byte characters that I always see in America. Then figure out how to make it universal at a later point. If you can guide me to that ultimate solution so that it handles all the unicode cases that would be greatly appreciated. –  Sep 21 '20 at 22:19
  • 1
    Well, if the characters are 2 byte characters, then again, remove those casts and fix the compiler errors. Casting to an `LPSTR` does not convert a multi-byte string into a single byte string, and possibly the reason why your sort breaks down. The easiest thing is to use the same character type between languages, not differing types. – PaulMcKenzie Sep 21 '20 at 22:20
  • @PaulMcKenzie OK, but as I was debugging the code I was getting the correct characters for each value of `d`. –  Sep 21 '20 at 22:22
  • So what is the reason for the `LPSTR` cast? That's why you should remove it and recompile your code. If the error is that you cannot convert an `x` to a `y` and recommends a `reinterpret_cast`, then all of those casts are wrong, regardless of what the debugger is showing you. – PaulMcKenzie Sep 21 '20 at 22:24
  • The idea to use the `LPSTR` cast came from some example I found about how to access the elements of an array of OLE BSTRs from C++. If there is a better or more correct way, I would like to know it. –  Sep 21 '20 at 22:27
  • The code shown violates the C++ type-safety with all of the casting being done, thus the behavior is undefined, not just the (LPSTR), but the `(long**)` you're doing -- again, remove the casts, and you probably would get a wall of errors thrown at you by the compiler, not just warnings. You may have been better off tagging this as `C`, as very little, if any actual idiomatic C++ is being done in the code. – PaulMcKenzie Sep 21 '20 at 22:33
  • @PaulMcKenzie Fair point about C vs C++. I don't actually care which of the two is used to get this working. I am trying to research how fast this algorithm truly is, but it needs to work correctly first. –  Sep 21 '20 at 22:38
  • @PaulMcKenzie I removed the LPSTR() and `short v = a[hi][d];` now gives me a decimal value of `29249`. The low byte of that is ASCII character `A` and the high byte is ASCII character `r`. So I don't frankly understand what's happening there. The `short v` is being fed the first two characters, but those first two characters are coming from the first FOUR bytes of the string. The Princeton algorithm expects one character at a time. The `LPSTR` did give one character at a time. I accept that it is wrong, but I don't know what to do. –  Sep 21 '20 at 22:43
  • I suggest you take the Java code as a suggestion, and not attempt to do line-by-line translations. For example, those calls to `swapPointers` that do `++` and `--` , in C++ the order that function arguments are evaluated is not specified, thus the program exhibits undefined behavior. Second, C++ would use iterators and not pointers -- if you used iterators, you would use `std::iter_swap` to "swap pointers". Third, start off with simple `std::string` and attempt your solution with that. All of these things with `BSTR`, leave them alone for now. – PaulMcKenzie Sep 21 '20 at 22:47
  • @PaulMcKenzie Thank you for the advice. I guess I left out a crucial bit... at least from my perspective. I do want to evaluate the speed but most importantly the speed that data coming from Excel can be sorted with this algorithm. Excel VBA stores string data in OLE BSTRs and uses the OLE SAFEARRAY as the container for arrays of strings. So for any metric of performance to have relevance for me it must be with a BSTR SAFEARRAY that comes from Excel and is returned back to Excel, sorted. –  Sep 21 '20 at 22:53
  • In your first paragraph, you talk about the mythical language "C/C++". Did you port to both languages? Are you mixing C and C++? They are distinct langages. – Thomas Matthews Sep 21 '20 at 23:02
  • @ThomasMatthews Perhaps a poor description. I don't care witch is used. That was the point of C/C++. –  Sep 21 '20 at 23:05
  • @ThomasMatthews I'm just trying to evaluate the speed of sorting with this Radix Quicksort algorithm when calling it in a DLL coded in C or C++ or a mixture and calling that DLL from Excel VBA. But since the current code does not complete the sort, the performance (which is pretty good) is meaningless. –  Sep 21 '20 at 23:08

0 Answers0