0

The best way to illustrate my question is with this example (that doesn't work if I use the strstr CRT function):

const wchar_t* s1 = L"Hauptstraße ist die längste";
const wchar_t* s2 = L"Hauptstrasse";

bool b_s1_starts_with_s2 = !!wcsstr(s1, s2);
_ASSERT(b_s1_starts_with_s2);   //Should be true

So far the only WinAPI that seems to recognize linguistic string equivalency is CompareStringEx when used with the LINGUISTIC_IGNORECASE flag, but it is somewhat tricky & inefficient to use for this purpose as I will have to call it on s2 repeatedly until I reach its end.

So I was wondering if there's a better approach to doing this (under Windows)?

EDIT: Here's what I mean:

bool b_s1_starts_with_s2 = false;

int ln1 = (int)wcslen(s1);
int ln2 = (int)wcslen(s2);

for(int p = 1; p <= ln1; p++)
{
    if(::CompareString(LOCALE_USER_DEFAULT, LINGUISTIC_IGNORECASE,
        s1, p,
        s2, ln2) == CSTR_EQUAL)
    {
        //Match
        b_s1_starts_with_s2 = true;
        break;
    }
}
c00000fd
  • 20,994
  • 29
  • 177
  • 400
  • Your question is a bit confusing; the title says "starts with" but then you refer to `wcsstr` which searches the target string rather than just comparing the prefix. – Jonathan Potter Dec 21 '18 at 11:20
  • @JonathanPotter: OK, sure. What would you chose to use for that? That's what I'm asking. – c00000fd Dec 21 '18 at 17:49
  • Just clarify your question - do you mean "starts with" or do you mean "contains"? – Jonathan Potter Dec 21 '18 at 19:46
  • @JonathanPotter: _Is there a function/WinAPI to tell if one string **starts with** another string in a case-insensitive linguistic way?_ – c00000fd Dec 21 '18 at 19:53
  • Isn't that what your `CompareString` example does? Just give it the length to compare and it'll compare it. – Jonathan Potter Dec 21 '18 at 21:24
  • @JonathanPotter: It works but it's inefficient for a large string `s1` – c00000fd Dec 22 '18 at 05:11
  • I don't understand why. You only need to call it once to compare the prefix. – Jonathan Potter Dec 22 '18 at 06:48
  • @JonathanPotter: because you don't know upfront the length of `s2` string when it's evaluated "linguistically" to a different set of wchars. In my particular example character `ß` has linguistic equivalence to `ss` in German. – c00000fd Dec 22 '18 at 06:58

2 Answers2

1

You can use FindNLSString, check if the return value is zero.

Evidently it matches ß with ss

const wchar_t *s1 = L"Hauptstraße ist die längste";
const wchar_t *s2 = L"Hauptstrasse";

INT found = 0;
int start = FindNLSString(0, LINGUISTIC_IGNORECASE, s1, -1, s2, -1, &found);
wprintf(L"start = %d\n", start);

s1 = L"δεθ Testing Greek";
s2 = L"ΔΕΘ";
start = FindNLSString(0, LINGUISTIC_IGNORECASE, s1, -1, s2, -1, &found);
wprintf(L"start = %d\n", start);
Barmak Shemirani
  • 30,904
  • 6
  • 40
  • 77
  • 1
    Hey, I knew there was an API for that :) Thanks for your help! Somehow I missed that one. Slightly adjusting your code, the following call does the trick: `bool b_s1_starts_with_s2 = ::FindNLSStringEx(LOCALE_NAME_USER_DEFAULT, FIND_STARTSWITH | LINGUISTIC_IGNORECASE, s1, -1, s2, -1, &found, NULL, NULL, NULL) == 0;` – c00000fd Dec 22 '18 at 04:47
  • 1
    The one downside is that that API is not available on XP. But tbh I'm not sure if XP supported linguistic differences between characters anyway. – c00000fd Dec 22 '18 at 04:48
0

I have not tried it, but I think you probably could use LCMapStringEx to transform all strings to lowercase appropriately for the locale, and then do a normal string prefix match with wcsncmp.

(As noted in comments, it makes no sense that you used wcsstr in your example since wcsstr determines if one string contains another string. To determine if one string starts with another string, it's more efficient to use wcsncmp with the length of the prefix string.)

jamesdlin
  • 81,374
  • 13
  • 159
  • 204
  • Thanks for the suggestion, but I can't seem to have `LCMapString` convert it for both of my example strings. (It'd be nice to use it as a fallback method for XP support.) Otherwise, `FindNLSStringEx` seems to be the API as was suggested in another answer. – c00000fd Dec 22 '18 at 04:50