8

I need my iPhone / iPad app to be able to quickly search through about 10,000 records (about a paragraph worth of text, each), for any substring contained within the record. So if the record contains the word "Flame", querying for "lame" should match.

I'm currently using SQLite, but "LIKE %term%" searches are too slow for this many records. Enabling Full-Text Search doesn't seem like it will fully meet my needs, since SQLite only supports prefix wildcards (e.g. "Flam*", not "*lame").

I've experimented with using a giant blob of text (~350K), and doing [NSString rangeOfString:...], which I think uses a Boyer-Moore algorithm. This is faster than "LIKE %term%" searches, but still not the kind of speed I'm hoping for.

Any suggestions for approaches, or libraries that would achieve this kind of scalable substring search, and which would work on an iPhone?

  • I had a similar dataset/query issue that I found I had to use UI and threading tricks to make it feel responsive. I did all the searching in a worker thread which would cancel/rerun the search as the user typed. I found no magic bullet. – NWCoder Mar 18 '11 at 17:32
  • Thanks NWCoder. I've considered that sort of asynchronous approach, as well. That aside, what approach did you settle on for searching? LIKE queries? –  Mar 18 '11 at 17:35
  • Yes I could only get the correct results with LIKE. One extra note, I ended up creating a simple object with just the searchable text and an ID referencing the extended attributes for the object. In the search specific version I normalized the text (all lowercase no-punctuation etc) and it helped a bit, but not much. (Maybe 5-10% speed increase.) – NWCoder Mar 18 '11 at 20:23

3 Answers3

3

Here are a number of different options. I am not aware of the bechmarks for each, so you will have to do some testing.

First is the FTS3 extension to SQLite. This should give you fast, indexed full text search: http://regularrateandrhythm.com/regular-rate-rhythm-blog/sqlite3-fts-in-IOS4.html

Then, how about regular expressions which were introduced in iOS 4:
http://developer.apple.com/library/ios/#documentation/Foundation/Reference/NSRegularExpression_Class/Reference/Reference.html

For pre-iOS 4, you can use RegexKitLite:
http://regexkit.sourceforge.net/RegexKitLite/index.html

If you decide to use regular expressions, then take a look at this entry on how to optimize those:
How to speed up iPhone regular expressions with NSRegularExpression?

Community
  • 1
  • 1
Kobski
  • 1,636
  • 15
  • 27
  • Regular expressions are quite slow... I'm sure you want some kind of indexed O(1) solution. Would love to hear if you roll your own or found a good solution via SQLite... – amattn Mar 18 '11 at 20:20
  • Yes, Regex are strictly slower than regular text searches. And full-text search in SQLite, as I wrote originally, doesn't do what I'm looking for. –  Mar 20 '11 at 04:22
0

If you can't tokenize the text (split it into words) you can't index it. That's why LIKE is a sequential search. Unless your substring can be constrained somehow (always drop the first letter or a fixed length for the substring, for instance) your text can't be stored as a list of all possible tokens and those tokens can't be indexed. The key (pun intended) is to find an algorithm that produces a small enough list of tokens that the cost of indexing them is less than the cost of a linear search.

0

Perhaps consider combining your second approach with the asynchronous approach. Divide your large block of text into 5,10,whatever size and search them separately with the same number of threads. Then combine results by using a coordinate system that knows how to position the matches correctly (e.g. thread 5 searched region 5 and found a match at position 337 which correlates to document x, position y). You will find that there is a limit where adding more threads does no good so that would be the first thing to figure out.

Patrick
  • 109
  • 4