What data structure or algorithm is used for autocompletion?

Question

When we type half the command or name and we press tab, it immediately finds out remaining part. What data structure/algorithm is used underneath to achieve this efficiency?

This question is related to this http://stackoverflow.com/questions/5570795/how-does-bash-tab-completion-work . I recommend you take a look at readline library source to get definite answer. You can find the library here http://cnswww.cns.cwru.edu/php/chet/readline/rltop.html — sdkljhdf hda, Apr 18 '14 at 11:24
@lego I don't agree. This questions specifically about an algorithm to solve the problem, which neither the linked question nor the documentation talks about — Niklas B., Apr 18 '14 at 15:39

score 2 · Answer 1 · answered Apr 18 '14 at 17:06

A trie is a good data structure to solve this problem.

It's a tree where each edge represents a next possible character to be appended to the string defined by the current path from the root.

So if you were to type in in, you'd travel along root -i-> i -n-> in, and explore that subtree to find inn.

You can include a flag on each node to indicate whether it contains a valid word (for non-leaves, as leaves will only get created if it contains a valid word).

A more common (but less specialized) data structure one can use is a binary search tree (BST).

A binary search tree (BST) ... is a node-based binary tree data structure where each node has a comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all nodes in that node's left subtree and smaller than the keys in all nodes in that node's right sub-tree.

Depending on the BST implementation, you should either be able to:

Call a range function to get all elements between two values, specifically between the string and its 'increment'. For example, if given abc, the 'incremented' string would be abd. If given abz, the 'incremented' string would be aca (assuming we just allow a-z in the BST, otherwise you can just pick the character after z in character set, which is { in ASCII, for example).
Call a ceiling-type function to get the least element greater than or equal to the given string, then repeatedly get the in-order successor, until the gotten element no longer starts with the given string.

score 1 · Answer 2 · answered Apr 18 '14 at 14:11

1

you can store your set of strings in a Directed Acyclic Graph.

Each node of the graph corresponds to a possible prefix, with links to possible one-letter extensions of said prefix. The root of the graph goes with the empty prefix. Leaves are the possible complete entries.

In Python there is a module called DAWG to handle these.

answered Apr 18 '14 at 14:11

Alix Martin

332
1
5

1

This is more commonly known as a [trie](http://en.wikipedia.org/wiki/Trie). Although I think describing it as a **tree**, not a DAG, is simpler. – Bernhard Barker Apr 18 '14 at 17:08
1

@Dukeling No. A DAWG (or rather, a [deterministic acyclic finite state automaton](https://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton)) is *not* a tree, it shares suffixes so it is actually a proper DAG and if it's the minimal such automaton to accept exactly the dictionary strings, it obviously saves space over a compressed trie on the same dictionary if at least two words share a suffix – Niklas B. Apr 19 '14 at 16:43

What data structure or algorithm is used for autocompletion?

2 Answers2