12

I need to perform a SELECT queries that are insensitive to case and accents. For demo purposes, I create a table like that:

create table table
(
  column text collate nocase
);

insert into table values ('A');
insert into table values ('a');
insert into table values ('Á');
insert into table values ('á');

create index table_cloumn_Index
  on table (column collate nocase);

Then, I get those results when executing the following queries:

SELECT * FROM table WHERE column LIKE 'a';
> A
> a

SELECT * FROM table WHERE column LIKE 'á';
> á

SELECT * FROM table WHERE column LIKE 'Á';
> Á

How can I fix that so the results for any of the following queries be like that:

> A
> a
> Á
> á

The sqlite is running on iOS, by the way.

Thanks in advance,

Leandro Alves
  • 2,190
  • 3
  • 19
  • 24

3 Answers3

18

Two basic approaches:

  1. You can create a second column in the table which contains the string without the international characters. Furthermore, before doing a search against this secondary search column, you should also remove international characters from the string being search for, too (that way you are comparing non-international to non-international).

    This is the routine I use to convert the international characters:

    NSData *data = [string dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
    string = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
    

    You could also replace the accented characters with:

    NSMutableString *mutableString = [string mutableCopy];
    CFStringTransform((__bridge CFMutableStringRef)mutableString, NULL, kCFStringTransformStripCombiningMarks, NO);
    

    By the way, if you need to sort your results, you can also sort upon this secondary search field instead of the main field, which will avoid problems stemming from SQLite's inability to sort the international characters, either.

  2. You can alternatively create your own "unaccented" C function (define this C function outside the @implementation for your class):

    void unaccented(sqlite3_context *context, int argc, sqlite3_value **argv)
    {
        if (argc != 1 || sqlite3_value_type(argv[0]) != SQLITE_TEXT) {
            sqlite3_result_null(context);
            return;
        }
    
        @autoreleasepool {
            NSMutableString *string = [NSMutableString stringWithUTF8String:(const char *)sqlite3_value_text(argv[0])];
            CFStringTransform((__bridge CFMutableStringRef)string, NULL, kCFStringTransformStripCombiningMarks, NO);
            sqlite3_result_text(context, [string UTF8String], -1, SQLITE_TRANSIENT);
        }
    }
    

    You can then define a SQLite function that will call this C-function (call this method after you open the database, which will be effective until you close that database):

    - (void)createUnaccentedFunction
    {
        if (sqlite3_create_function_v2(database, "unaccented", 1, SQLITE_ANY, NULL, &unaccented, NULL, NULL, NULL) != SQLITE_OK)
            NSLog(@"%s: sqlite3_create_function_v2 error: %s", __FUNCTION__, sqlite3_errmsg(database));
    }
    

    Having done that, you can now use this new unaccented function in SQL, e.g.:

    if (sqlite3_prepare_v2(database, "select a from table where unaccented(column) like 'a'", -1, &statement, NULL) != SQLITE_OK)
        NSLog(@"%s: insert 1: %s", __FUNCTION__, sqlite3_errmsg(database));
    
Rob
  • 415,655
  • 72
  • 787
  • 1,044
  • Yeah, I was using this approach too, but now I need to deal with a database which I cannot change it's structure. But if you can, and your db isn't too huge, it's a easy way to go. – Leandro Alves May 02 '12 at 16:31
  • @LeandroAlves I know you've long since moved past this issue, but I (a) include another variation for unaccenting strings; and (b) show you how to unaccent strings in SQLite on the fly. – Rob Aug 03 '13 at 01:20
  • @Rob The code for Option 2 executes and handles the data correctly, but takes the CPU to 99% and Memory > 1000 MB. The application crashes with `EXC_BAD_ACCESS (code=2, address=0x0)` immediately after the operations complete. Is there anything that can be done to control memory usage with this code? – Chris Schiffhauer Nov 16 '13 at 17:49
  • @ChrisSchiffhauer This will require some discussion, so I created a chat room for this follow-up question: http://chat.stackoverflow.com/rooms/41329/how-to-deal-with-accented-characters-in-ios-sqlite – Rob Nov 16 '13 at 21:54
  • @ChrisSchiffhauer The memory problems that arise by doing this update on many, many records can be resolved by introducing of an `@autoreleasepool`. See revised answer. – Rob Nov 21 '13 at 01:26
  • Wow, if ever there was an answer that deserved a bounty, it's this one. Incoming in 24 hours. – Chris Schiffhauer Nov 26 '13 at 01:10
  • Credit to LS_dev for [suggestion to use `SQLITE_TRANSIENT`](http://stackoverflow.com/a/20245102/1271826). Much better. – Rob Nov 27 '13 at 15:29
4

You will need to either create some user function, or override (i.e. replace) the default implementation of the like() functions. The reason is that the LIKE operator in sqlite doesn't support non-ASCII case-insensitiveness:

SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.

This makes sense otherwise sqlite would need to support different cultures since case varies from one to the other. An example is the capital i in Turkey which is not I but a dotted İ, and the lower-case of I is a dot-less ı. Embedding all this culture information in sqlite would be very burdensome (i.e. it would increase sqlite object code).

  • Did you already created any user function on iOS? Do you have any sample code about that? This seems to be the righter way of doing this. – Leandro Alves May 02 '12 at 16:30
2

Here is my solution of LIKE problem

static void myLow(sqlite3_context *context, int argc, sqlite3_value **argv)
{
    NSString* str = [[NSString alloc] initWithUTF8String:
                            (const char *)sqlite3_value_text(argv[0])];
    const char* s = [[str lowercaseString] UTF8String];
    sqlite3_result_text(context, s, strlen(s), NULL);
    [str release];
}

// call it once after opening db
sqlite3_create_function(_db, "myLow", 1, SQLITE_UTF8,NULL, &myLow, NULL, NULL);

And then instead of query

SELECT * FROM table WHERE column LIKE 'a'

you should use

SELECT * FROM table WHERE myLow(column) LIKE 'a'
pigmasha
  • 201
  • 3
  • 6