1

I'm trying to find the strings in a TreeSet<String> that start with a given prefix. I found a previous question asking for the same thing — Searching for a record in a TreeSet on the fly — but the answer given there doesn't work for me, because it assumes that the strings don't include Character.MAX_VALUE, and mine can.

(The answer there is to use treeSet.subSet(prefix, prefix + Character.MAX_VALUE), which gives all strings between prefix (inclusive) and prefix + Character.MAX_VALUE (exclusive), which comes out to all strings that start with prefix except those that start with prefix + Character.MAX_VALUE. But in my case I need to find all strings that start with prefix, including those that start with prefix + Character.MAX_VALUE.)

How can I do this?

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • What is the type of cityNames? Could you post a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve)? – tnas May 24 '17 at 16:31

2 Answers2

1

To start with, I suggest re-examining your requirements. Character.MAX_VALUE is U+FFFF, which is not a valid Unicode character and never will be; so I can't think of a good reason why you would need to support it.

But if there's a good reason for that requirement, then — you need to "increment" your prefix to compute the least string that's greater than all strings starting with your prefix. For example, given "city", you need "citz". You can do that as follows:

/**
 * @param prefix
 * @return The least string that's greater than all strings starting with
 *         prefix, if one exists. Otherwise, returns Optional.empty().
 *         (Specifically, returns Optional.empty() if the prefix is the
 *         empty string, or is just a sequence of Character.MAX_VALUE-s.)
 */
private static Optional<String> incrementPrefix(final String prefix) {
    final StringBuilder sb = new StringBuilder(prefix);

    // remove any trailing occurrences of Character.MAX_VALUE:
    while (sb.length() > 0 && sb.charAt(sb.length() - 1) == Character.MAX_VALUE) {
        sb.setLength(sb.length() - 1);
    }

    // if the prefix is empty, then there's no upper bound:
    if (sb.length() == 0) {
        return Optional.empty();
    }

    // otherwise, increment the last character and return the result:
    sb.setCharAt(sb.length() - 1, (char) (sb.charAt(sb.length() - 1) + 1));
    return Optional.of(sb.toString());
}

To use it, you need to use subSet when the above method returns a string, and tailSet when it returns nothing:

/**
 * @param allElements - a SortedSet of strings. This set must use the
 *                      natural string ordering; otherwise this method
 *                      may not behave as intended.
 * @param prefix
 * @return The subset of allElements containing the strings that start
 *         with prefix.
 */
private static SortedSet<String> getElementsWithPrefix(
        final SortedSet<String> allElements, final String prefix) {

    final Optional<String> endpoint = incrementPrefix(prefix);

    if (endpoint.isPresent()) {
        return allElements.subSet(prefix, endpoint.get());
    } else {
        return allElements.tailSet(prefix);
    }
}

See it in action at: http://ideone.com/YvO4b3.

ruakh
  • 175,680
  • 26
  • 273
  • 307
0

If anybody is looking for a shorter version of ruakh's answer:

First element is actually set.ceiling(prefix),and last - you have to increment the prefix and use set.floor(next_prefix)

public NavigableSet<String> subSetWithPrefix(NavigableSet<String> set, String prefix) {
    String first = set.ceiling(prefix);
    char[] chars = prefix.toCharArray();
    if(chars.length>0)
        chars[chars.length-1] = (char) (chars[chars.length-1]+1);
    String last = set.floor(new String(chars));
    if(first==null || last==null || last.compareTo(first)<0)
        return new TreeSet<>();
    return set.subSet(first, true, last, true);
}
Alexey
  • 46
  • 5
  • This misbehaves when `prefix` is empty or ends with `Character.MAX_VALUE`; and it's needlessly complicated for the cases that it does support. (And why would anyone be "looking for a shorter version of ruakh's answer"?) – ruakh Sep 06 '19 at 22:59