-2

Using an open source Java automaton library, eg: org.apache.lucene.util.automaton or dk.brics.automaton, how can I build an automaton for prefix matching?

eg: an automaton created from the set of strings ["lucene", "lucid"], that will match when given "luc", or "luce", but not match when given "lucy" or "lucid dream".

tekumara
  • 8,357
  • 10
  • 57
  • 69
  • This is exactly how a [trie](https://en.wikipedia.org/wiki/Trie) works. A similar idea can be used to construct the automaton. Usage of an "end of input" character might be useful as well - such as `$`. – Obicere Feb 12 '17 at 21:56
  • I'm familiar with tries, although the implementations I've found in Java (eg: PatriciaTrie) are actually Maps, and will return a value associated with a prefix. I just want to check for the presence of a prefix. – tekumara Feb 22 '17 at 00:13

1 Answers1

0

Prefix matching is possible using org.apache.lucene.util.automaton by setting all states to accept, eg:

    String[] strings = new String[]{"lucene", "lucid dream"};
    final List<BytesRef> terms = new ArrayList<>();
    for(String s : strings) {
        terms.add(new BytesRef(s));
    }
    Collections.sort(terms);
    final Automaton a = DaciukMihovAutomatonBuilder.build(terms);

    for (int i = 0; i < a.getNumStates(); i++) {
        a.setAccept(i, true);
    }
tekumara
  • 8,357
  • 10
  • 57
  • 69
  • 1
    Could you complete the example with the code that tests a string against the such created Automaton? I cannot find any examples with the `DaciukMihovAutomatonBuilder` – tonicebrian Aug 17 '20 at 13:49