What's an easy way to compare case insensitively of strings?

Question

Trying to compare strings using:

!(stringvector[i]).compare(vector[j][k])

only works for some entries of

vector[j][k]

-- namely ones that are a case sensitive string match.

How do I get case-insensitive matching from this functionality?

Here is a bit of code I was working on

#include <iostream>
#include <vector>
#include <string>

using namespace std; //poor form


vector<string> stringvector = {"Yo", "YO", "babbybabby"};
vector<string> vec1 = {"yo", "Yo" , "these"};
vector<string> vec2 = {"these", "checked" , "too" , "Yo", "babbybabby"};
vector<vector<string>> vvs = {vec1, vec2};

for (int v = 0; v < vvs.size(); v++) //first index of vector
{
    for(int s = 0; s < vvs[v].size(); s++) //second index of vector
        {

            for(int w = 0; w < stringvector.size(); w++)
            {
                if (stringvector[w] == vvs[v][s])
                {cout << "******FOUND******";}
            }

    }
}

This doesn't print out FOUND for the case-insensitive matches.

Stringvector[w] == vvs[v][s] does not make case-insensitive comparison, is there a way to add this functionality easily?

--Prof D

You should show the declaration of `vector` and `stringvector`. — nwp, Feb 16 '17 at 09:36
In the initializer list of `vec2`, the `"Yo,"` should be `"Yo",` instead? — felix, Feb 16 '17 at 09:56
There is nothing obvious wrong with the code, that is how you compare strings. Make a full compilable example, show the output, say why you expect a different output. — nwp, Feb 16 '17 at 09:59
Show us a [mcve] with the expected output and the actual output. — Martin Bonner supports Monica, Feb 16 '17 at 10:01
Incidentally, I strongly suggest `std::cout << "**FOUND**" << stringvector[w] << std::endl;` so you can see what matches and what doesn't. — Martin Bonner supports Monica, Feb 16 '17 at 10:03
Also, declaring a variable called `vector` is very confusing - *particularly* when you obviously have a `using namespace std;` in effect. (Don't do that.) — Martin Bonner supports Monica, Feb 16 '17 at 10:04
The code will work as expected with @felix changes (and change `string.h` to `string`): http://ideone.com/W5exeN — mch, Feb 16 '17 at 10:05
Final comment: Range based for's + auto removes a lot of the clutterthis a lot easier to read: `for (const auto& vec : vec_vec) for (const auto& str : vec) for (const auto& target : stringvector) if (target == str) { ... }` - with some newlines obviously! — Martin Bonner supports Monica, Feb 16 '17 at 10:07
Thanks for all the help, the problem was found deeper in the code....mainly the code in my brain. I had assumed that the code in my brain was equivalent to the source code. The source code relied on a critical text file that enumerates the amounts of strings...the text file was neglected and apparently my program was doing the wrong thing that I designed it to do perfectly. Haha. — prof_dunwem, Feb 16 '17 at 10:59
@MartinBonner I like your code for representing this solution, but does my code allow for more flexibility in changing parameters between index changes? If your code can be extended easily for that, please let me know! — prof_dunwem, Feb 17 '17 at 03:16
If you need access to indices, then your code is better. You usually don't need access to indices (and range based for works with containers like `list` which don't have have an `operator[]`) — Martin Bonner supports Monica, Feb 17 '17 at 06:50

score 2 · Answer 1 · answered Feb 06 '20 at 22:15

tl;dr

Use the ICU library.

"The easy way", when it comes to natural language strings, is usually fraught with problems.

As I pointed out in my answer to that "lowercase conversion" answer @Armando linked to, if you want to actually do it right, you're currently best off using the ICU library, because nothing in the standard gives you actual Unicode support at this point.

If you look at the docs to std::tolower as used by @NutCracker, you will find that...

Only 1:1 character mapping can be performed by this function, e.g. the Greek uppercase letter 'Σ' has two lowercase forms, depending on the position in a word: 'σ' and 'ς'. A call to std::tolower cannot be used to obtain the correct lowercase form in this case.

If you want to do this correctly, you need full Unicode support, and that means the ICU library until some later revision of the C++ standard actually introduces that to the standard library.

Using icu::UnicodeString -- clunky as it might be at first -- for storing your language strings gives you access to caseCompare(), which does a proper case-insensitive comparison.

Mandy007 · Answer 2 · 2020-02-10T14:45:32.233

1

You can implement a function for this purpose, example:

bool areEqualsCI(const string &x1, const string &x2){
    if(x1.size() != x2.size()) return false;
    for(unsigned int i=0; i<x2.size(); ++i) 
        if(tolower((unsigned char)x1[i]) != tolower((unsigned char)x2[i])) return false;
    return true;
}

I recommendy see this post How to convert std::string to lower case?

edited Feb 10 '20 at 14:45

answered Feb 06 '20 at 20:52

Mandy007

421
7
18

1

should be either `tolower((unsigned char)x[1])` or `tolower(x[1], std::locale())`; the C library version is undefined for negative values – M.M Feb 06 '20 at 21:55
1

Also, it doesn't actually work right for MBCS. – Deduplicator Feb 06 '20 at 21:57
Thank for the recommendations @M.M – Mandy007 Feb 11 '20 at 17:00

NutCracker · Answer 3 · 2020-02-06T22:03:35.327

First, I gave myself some freedom to pretty up your code a bit. For that purpose I replaced ordinary for loops with range-based for loops. Furthermore, I have changed your names of the variables. They are not perfect yet though since I don't know what's the purpose of the code. However, here is a refactored code:

#include <iostream>
#include <vector>
#include <string>

int main() {
    std::vector<std::string> vec1 = { "Yo", "YO", "babbybabby" };
    std::vector<std::string> vec2 = { "yo", "Yo" , "these" };
    std::vector<std::string> vec3 = { "these", "checked", "too", "Yo", "babbybabby" };
    std::vector<std::vector<std::string>> vec2_vec3 = { vec2, vec3 };

    for (auto const& i : vec2_vec3) {
        for (auto const& j : i) {
            for (auto const& k : vec1) {
                if (k == j) {
                    std::cout << k << " == " << j << std::endl;
                }
            }
        }
    }

    return 0;
}

Now, if you want to compare strings case-insensitively and if you have access to Boost library, you could use boost::iequals in the following manner:

#include <boost/algorithm/string.hpp>

std::string str1 = "yo";
std::string str2 = "YO";

if (boost::iequals(str1, str2)) {
    // identical strings
}

On the other hand, if you don't have access to Boost library, you can make your own iequals function by using STL algorithms (C++14 required):

bool iequals(const string& a, const string& b) {
    return std::equal(str1.begin(), str1.end(),
                      str2.begin(), str2.end(),
                      [](char a, char b) {
                          return std::tolower(a, std::locale()) == std::tolower(b, std::locale());
                      });
}

std::string str1 = "yo";
std::string str2 = "YO";

if (iequals(str1, str2)) {
    // identical strings
}

Note that this would only work for Single-Byte Character Sets (SBCS).

should be either `tolower((unsigned char)a)` or `tolower(a, std::locale())` , the C library version is undefined for negative values — M.M, Feb 06 '20 at 21:55
You should also mention that it only works for single-byte chracter sets. — Deduplicator, Feb 06 '20 at 21:58

What's an easy way to compare case insensitively of strings?

3 Answers3