3

I have a TreeSet in Java that contains Strings (specifically words). I need to write a method...

public boolean isValidPrefix(String prefix)

...which accepts a prefix as an argument and checks the TreeSet to see if any of its contained words begin with the prefix.

For example, given the prefix "CA" and a TreeSet containing {"DOG,"CAT","COW"}, my method would need to identify that there is a word "CAT" which starts with the prefix.

P.S. I would iterate through the TreeSet, but time complexity is an obvious constraint as the TreeList will be up to 200,000 words in many instances.

ordanj
  • 361
  • 1
  • 8
  • 19
  • You should update your title; the problem here is "finding entries in a (Tree)Set that match a regex". – Oliver Charlesworth Jul 09 '14 at 22:32
  • 1
    @MarcoAcierno I changed one word. And he gave the prefix "ca", and said that he would need to find the word "cow". But "cow" doesn't start with "ca", but "cat" does. How is this a big change? – AntonH Jul 09 '14 at 22:36
  • TreeSets are typically ordered according to a comparator. What comparator do you use? It would have a big impact on the question. – blgt Jul 09 '14 at 22:36
  • @AntonH Maybe the prefix was in another word and he want to select COW because it has this prefix before. "maybe" – Marco Acierno Jul 09 '14 at 22:37
  • 1
    @blgt: I don't think there's any comparator that would do anything meaningful with an arbitrary regex... – Oliver Charlesworth Jul 09 '14 at 22:37
  • @MarcoAcierno: "There is a word 'COW' which starts with the prefix" - unlikely. – Oliver Charlesworth Jul 09 '14 at 22:38
  • 1
    @OliCharlesworth It is. Natural ordering for strings is Unicode-lexicographic; and the OP asks for *prefixes*. This can be solved in `log n` for the default comparator – blgt Jul 09 '14 at 22:38
  • @blgt: Ah, that's true. If the OP is really only interested in prefixes, then you're correct. Even so, I'm still not sure how you'd do this with a TreeSet (as opposed to a user-defined tree data structure). – Oliver Charlesworth Jul 09 '14 at 22:39
  • @MarcoAcierno In which case the OP is free to change the question. But there is no way that, in English, "cow" begins with "ca". – AntonH Jul 09 '14 at 22:39
  • I've updated the question title to reflect the problem description. – Oliver Charlesworth Jul 09 '14 at 22:40
  • @AntonH Yes ofcourse you are right, but i was talking about CAT - COW. i was talking about the fact that the word was for example "cat cow" and if it starts with cat select cow. But it's just a "maybe". It's useless to continue to talk about that. Back to the question. – Marco Acierno Jul 09 '14 at 22:41
  • Can you use another data structure? Are you locked into using a `TreeSet`? – Vivin Paliath Jul 09 '14 at 22:44
  • 200k isn't that much to iterate; try it before worrying about perf too much. – spudone Jul 09 '14 at 23:37
  • I'm not locked into TreeSet, but it has O(logN) for add, remove and contains, which I need. – ordanj Jul 09 '14 at 23:39
  • In regards to the COW/CAT comments, COW does not begin with CA and therefore isValidPrefix("CA") would return false for a TreeSet that only contained {"DOG","COW"}. Since the TreeSet I provided contains {"DOG","CAT","COW"}, it would return true, as "CAT" contains "CA". – ordanj Jul 10 '14 at 00:03

1 Answers1

8

If one String foo is a prefix of another String bar in the TreeSet, I believe it is a safe assumption for bar to immediately follow foo in the TreeSet.

Thus, I believe it suffices to take TreeSet.ceiling(foo) and check whether foo is a prefix of it.

From the documentation of that function, we see that it returns exactly the element that would follow the given element in order.

Returns the least element in this set greater than or equal to the given element, or null if there is no such element.

The algorithm would thus be :

  1. Call TreeSet.ceiling() on the input. If the return value is null, then return false.
  2. If the return value is not null, return whether the input is a valid prefix of the return value.
spiderman
  • 10,892
  • 12
  • 50
  • 84
merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • That sounds reasonable. Let me verify it before I select this as the answer. – ordanj Jul 09 '14 at 23:41
  • 1
    You're answer is not quite right, as I had to call .contains(prefix) on the return of TreeSet.ceiling(prefix). The only way ceiling(prefix) would return null is if the prefix is greater than every string in the TreeSet. – ordanj Jul 09 '14 at 23:53