2

I am wondering how would I be able to run a SQLite order by in this manner

select * from contacts order by jarowinkler(contacts.name,'john smith');

I know Android has a bottleneck with user defined functions, do I have an alternative?

Pentium10
  • 204,586
  • 122
  • 423
  • 502

2 Answers2

1

Step #1: Do the query minus the ORDER BY portion

Step #2: Create a CursorWrapper that wraps your Cursor, calculates the Jaro-Winkler distance for each position, sorts the positions, then uses the sorted positions when overriding all methods that require a position (e.g., moveToPosition(), moveToNext()).

CommonsWare
  • 986,068
  • 189
  • 2,389
  • 2,491
  • I am doing something similar in Java, but takes so much time while computes a N x M steps, script runs 2-3 minutes for 300 x 500. – Pentium10 May 17 '10 at 11:16
  • I am unclear where the "N x M steps" comes from. If it is too slow to compute in Java, use the NDK. – CommonsWare May 17 '10 at 13:44
  • I am working on a person sync database, 1 end has N records, the other has M records, I run the jaro-winkler distance algorithm on their names to match the best. – Pentium10 May 17 '10 at 15:52
1

Pre calculate string lengths and add them into separate column. Then sort entired table by that that length. Add indexes (if you can). Then add extra filters for example you don't want to compare "Srivastava Brahmaputra" to "John Smith". The length are out of wack by way too much so exclude these kind of comparison by length as a percentage of the total length. So if your word is 10 characters compare it only to words with 10+-2 or 10+-3 characters.

This way you will significantly reduce the number of times this algorithm needs to run.

Typically in the vocalbulary of 100 000 entries such filters reduce the number of comparisons to about 300. Unless you are doing a full blown record linkage and then I would wonder why use Android for that. You would still need to apply probabilistic methods for that and calculate scores and this is not a job for Android (at least not for now).

Also in MS SQL Server Jaro Winkler string distance wrapped into CLR function perform much better, since SQL Server doesn't supprt arays natively and much of the processing is around arrays. So implementation in T-SQL add too much overhead, but SQL-CLR works extremely fast.

Ivan
  • 103
  • 1
  • 7