2

We (IT Department at work) are looking to build an iPad app, that will take numeric IDs and provide a simple lookup in a table. It is essentially a primary key search on a single table, and displaying a field after minor processing.

The Caveat:

There are 4.5 million rows in this table, and it needs a lookup time of maximum of 1 second. It will not have an internet connection, so it has to happen on the device. We have a few ideas but which makes most sense:

  1. Sqlite: Will it stand up to such abuse? Can it handle that many rows, and will it do it well?

  2. Flat file search: we can loop over the file ourselves, or split them up by the first few digits to do some slightly more intelligent indexing.

  3. Off load to some 3rd party database app on the device which can handle it through an API.

  4. Something else entirely which in our infinite wisdom we have missed.

I must take the chance to thank apple for making this so easy to test ourselves. Without a Mac or Dev license, we don't want to commit over £2000 until we know we can get it done well.

Paystey
  • 3,287
  • 2
  • 18
  • 32
  • Have you actually tried to do this? It's a couple of hours' work to load up a 4.5 million record table into Core Data and then knock up a test app on an iPad to see how fast you get your results. You'll get an actual idea of the size of the data and the performance, and you won't have to guess. – Abizern Aug 10 '12 at 13:45
  • See the last sentence in the question. We have an iPad but no dev license or Mac, it's not a problem to get one, but just not for testing only. I should say too, these are small records, < 500 bytes each – Paystey Aug 10 '12 at 13:48
  • You could ask someone with a dev account and an iPad to build and install it and see what the results are like. Or get them to add your device to their dev account and they can send the build to you so you can see what it looks like for yourself. – Abizern Aug 10 '12 at 13:51
  • Do you have a dev account and an iPad? :) only joking, we only wanted to know about any serious hurdles we'd come across. Given some more confidence in the flexibility of the software we'll probably go for it. – Paystey Aug 10 '12 at 13:53
  • If you want to send me a csv of the records - or a reasonably sized sample of them I'll knock up a test app for you - I've got an old version of something similar that I can retask to give you an idea of the response times – Abizern Aug 10 '12 at 13:55
  • I may take you up on that. How's best to contact you privately? – Paystey Aug 10 '12 at 14:22
  • abizern@abizern.org will get a mail to me. – Abizern Aug 10 '12 at 14:26

4 Answers4

3

Although Sqlite should work fine, it is probably overkill. A simple binary search should be all you need. If it is too slow on one big file, split it into 10 or 100 subfiles by first or first two digits.

Optionally, you could load the data into a CFArray, which supports binary search (see CFArrayBSearchValues). While this would have a performance penalty for the initial load, it may perform better on subsequent searches than a custom binary search on one or more files.

D'Arcy Rittich
  • 167,292
  • 40
  • 290
  • 283
  • Rather than implement a binary search - you could just use a dictionary. – Abizern Aug 10 '12 at 13:01
  • We thought about this but could the sheer number of records cause an issue? Is it easy enough to optimize the memory management if we've got 4.5 million rows (very small records, just lots of them) – Paystey Aug 10 '12 at 13:23
  • Yeah, that could be an issue when loading all the data, I don't know. But the file access method should consume very little memory. – D'Arcy Rittich Aug 10 '12 at 13:27
  • So if we split the records across 100 files (first two digits) and only search upon request (very frequently). Would this still be able to return in within a second (~45,000 rows with a random distribution). Is the drive access (flash I know) fast enough? – Paystey Aug 10 '12 at 13:32
  • 4.5 million rows is too much to load in memory! Lets say that they have a small structure that only consumes 10 bytes, it would be 45MB of memory, which is already too much. Are the queries totally random? because splitting the content in different files will introduce a huge overload every time that a new file is accessed, but it could be good enough if you can predict it or sort it in a way that the minimum amount of reads are performed. – Angel G. Olloqui Aug 10 '12 at 13:51
  • Nope, they are barcodes being entered which have an entirely random distribution. Which is our worry with splitting the files up, we'd have to load them each time. – Paystey Aug 10 '12 at 13:55
1

As I've already sort of said in a comment - This isn't hard to test, you can load up a large database into a Core Data store and create a test app to see how fast the results are returned.

I say Core Data over SQLite - as it's optimised for the platform (even though it uses SQLite as a storage medium), and it's easier to write code that returns values and displays them.

Edited to add

I created a sample project that loaded up a dataset of 11 records and 400,000 records. This used Core Data on iOS 5.

Ran the test on my iPad2 (yes 2)

search times varied between 3-8 ms (0.003 - 0.008 s), no appreciable difference between the smaller and larger dataset.

This is unoptimised code, running in debug mode, and not written for any sort of performance enhancements - the search predicate was created at every search rather than cached, for example, no threading.

The size of the datastore for 400,000 records is 17.2 mb, so even the larger 4.5m records would easily fit on an iPad.

Abizern
  • 146,289
  • 39
  • 203
  • 257
1

SQLite is amazingly fast. A test table with 4.5 million records has that structure:

CREATE TABLE testtable (numericid INTEGER PRIMARY KEY, testtext TEXT);

It is filled with increasing values for numericid (0, 1, .... ) and a string for testtext.

Doing all the insert atomically took 1 hour 42 minutes on a MacBook pro (2009). The resulting SQLite file is 94 MB of size.

Inside the iOS app, the database is opened in the viewDidLoad method. A simple button triggered database query like this:

- (void)btnPressed:(UIButton *)sender{

    NSLog(@"btn pressed, start");

    sqlite3_stmt *statement = nil;

    NSString *querystring;

    querystring= [NSString stringWithFormat:@"SELECT * FROM testtable WHERE numericid = 2571312;"];  

    const char *sql = [querystring UTF8String];

    NSLog(@"sql is: %s", sql);

    if (sqlite3_prepare_v2(dbConnection, sql, -1, &statement, NULL)!=SQLITE_OK){

        NSLog(@"sql problem occured with: %s", sql);
        NSLog(@"%s", sqlite3_errmsg(dbConnection));

    }
    else
    {

        while (sqlite3_step(statement) == SQLITE_ROW) {            

            NSString *numericid = [NSString stringWithUTF8String:(char*)sqlite3_column_text(statement, 1)];
            NSString *testtext = [NSString stringWithUTF8String:(char*)sqlite3_column_text(statement, 0)];
            NSLog(@"%@",[NSString stringWithFormat:@"%@ (%@)", numericid, testtext]);

        } // while        

    }

    sqlite3_finalize(statement);    

    NSLog(@"btn pressed, finished");    

}

results in an output of:

2012-08-10 17:51:36.734 DBQueryTest[28462:707] Database Successfully Opened
2012-08-10 17:51:39.083 DBQueryTest[28462:707] btn pressed, start
2012-08-10 17:51:39.087 DBQueryTest[28462:707] sql is: SELECT * FROM testtable WHERE numericid = 2571312;
2012-08-10 17:51:39.099 DBQueryTest[28462:707] text2571312 (2571312)
2012-08-10 17:51:39.102 DBQueryTest[28462:707] btn pressed, finished

So a query takes under 19ms! This could be reproduced for several values of numericid, though I didn't run a fully randomized test for statistical evaluation.

Conclusion: This test setup fulfills your requirements. SQLite is definitely a way to go.

UPDATE:

A quick random access test with 100000 key values verifies the first result. Leaving sql statement string creation and time-consuming NSLog outputs out of time measurement, the average database query time drops by an order of magnitude to:

average query time: 1.8 ms

mean deviation: 0.4 ms

maximum query time: 25.9 ms

minimum query time: 0.6 ms

alex
  • 2,464
  • 23
  • 32
  • That's brilliant, thanks very much. it's far better than what I expected from sqlite with little memory. – Paystey Aug 13 '12 at 09:42
0

Loading this data into memory on iOS is a no-go.

You should use SQLLite. Thats what its made for and you will not do better handrolling file IO code.

deleted_user
  • 3,817
  • 1
  • 18
  • 27
  • Seems very definitive, do you have numbers on why memory is a no go? Or why SQLites handling is better, does it not need to load them into memory to search anyway? – Paystey Aug 10 '12 at 13:34
  • Having worked on iOS for a while I would say common sense dictates that loading 4.5 million of anything into memory on a mobile device is probably a bad idea. Your opinion may vary. – deleted_user Aug 10 '12 at 13:36
  • Sqlite is designed for evironments with minimal resources. If it loads entire tables into memory to do a query then I guess Im wrong. – deleted_user Aug 10 '12 at 13:39
  • Yes it does seem a very bad idea, but there are compromises we can make, if it means that the app takes 5 minutes to load and 5 minutes to shutdown again, this actually isn't a problem for us, it's the responsiveness within the app we need. So hard figures are useful to evaluate a compromise. But yes I can see what you mean, SQLite would be better than hand rolling, but that doesn't make SQLite good. – Paystey Aug 10 '12 at 13:46
  • It actually is a problem. Mobile apps are designed to not monopolize the resources of the device itself. If you didnt care about building it to a mobile spec Im not sure why you asked the question. Anybody can build an app that consumes as many resources as possible. Having a query come back in one second when an iPad app takes 5 minutes to start up is an interesting position. If any app took that long to startup the user would assume (correctly) that the device is hung and kill the app. – deleted_user Aug 10 '12 at 13:49