1

Interesting algorithm I would like to get the communities opinion on. I am looking to loop through a Sorted ArrayList<String> for the boolean result if a String exists in the array that begins with certain characters.

Ex. Array {"he", "help", "helpless", hope"}

search character = h 
Result: true
search character = he
Result: true
search character = hea
Result: false

Now my first impression was that I should combine binary search with regex but let me know if I am way off. While trie would be the best implementation I need a solution that minimizes heap memory (developing on android) as this array in practicality will contain ~10,000-20,000 entries (words).

I have a db that contains ~200,000 words. I am taking a subset beginning with a set letter (in my example h) which will contain ~20,000 entries and inserting these into an array. I am then performing ~100-1,000 lookups/contains using this subset. The thought in my approach was to increase performance time (instead of db querying) while trying to minimize the hit to memory (array instead of trie tree)

Perhaps a DAWG would optimize lookup however I'm not sure if the size requirements for this structure would be significantly larger than an ArrayList?

Matt Stokes
  • 4,618
  • 9
  • 33
  • 56
  • For binary search you need to know whether the element your currently looking at is "greater" or "less" than your target. Regex matches won't tell you that. So binary search is probably the way to go, but you'll want to compare against substrings lexically, I think. – Martin Ender Jul 15 '13 at 18:38
  • Binary search depends on a strict ordering of the data with respect to the key. Regexes can be constructed that would violate that requirement, so no, you cannot in general use regexes with a binary search. – Jim Garrison Jul 15 '13 at 18:39
  • 5
    The best data structure for your use case is a [trie](http://en.wikipedia.org/wiki/Trie) (prefix tree). – jlordo Jul 15 '13 at 18:39
  • my issue is that a trie will take up too much memory (working on android) as the array will contain between 10,000-20,000 words. I'm looking to tradeoff efficiency for space – Matt Stokes Jul 15 '13 at 18:42
  • @MattStokes You can implement a trie in a very efficient way. I'm guessing that you are worried by many pointers/references right? If that's the case, let me know and I'll write how to do it. – pkacprzak Jul 15 '13 at 21:45

2 Answers2

2

If you really want to avoid a trie, this should fit your needs:

NavigableSet<String> tree = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
tree.addAll(Arrays.asList("he", "help", "helpless", "hope"));
String[] queries = {"h", "he", "hea"};
for (String query : queries) {
    String higher = tree.ceiling(query);
    System.out.println(query + ": " + higher.startsWith(query));
}

prints

h: true
he: true
hea: false
jlordo
  • 37,490
  • 6
  • 58
  • 83
  • I've never used a TreeSet what are the memory requirements? – Matt Stokes Jul 15 '13 at 19:03
  • `O(n)`, see [What are the pros and cons of a TreeSet](http://stackoverflow.com/questions/1298144/what-are-the-pros-and-cons-of-a-treeset) and [http://en.wikipedia.org/wiki/Red%E2%80%93black_tree](http://en.wikipedia.org/wiki/Red%E2%80%93black_tree) – jlordo Jul 15 '13 at 19:07
  • "It has considerably more overhead than ArrayList" – Matt Stokes Jul 15 '13 at 19:08
  • 1
    @MattStokes: I suggested you a trie and a `TreeSet`, two of the best options for what you are trying to do. I won't try to think of the third best option and move on now... – jlordo Jul 15 '13 at 19:12
  • thank you for your input. Its just a large hit to heap memory – Matt Stokes Jul 15 '13 at 19:15
0

You should consider http://en.wikipedia.org/wiki/Skip_list as an option. Many java implementations are readily available

ElKamina
  • 7,747
  • 28
  • 43