So this is motivated by a recent question, which asked how to quickly determine if a query word could be permuted to match a particular word in a given dictionary of words. The basic idea for a quick query solution was simple: First, for preprocessing, for each dictionary word, hash the tuple of how many times each letter in the alphabet occurs, and then after preprocessing, for a query word all you have to do is hash the same type of tuple and see whether or not you get a match in your hash table.
So basically, that problem came down to figuring out whether a tuple of non-negative integers (counts of each letter in the alphabet) exactly matched a tuple in the hash table, where the hash table could first be constructed quickly and not take up too much memory, compared to the size of the original dictionary of words.
So now I want to extend the problem, and in terms of strings, the extended problem is whether a query string can be permuted to match a SUB-sequence of one of the dictionary strings (i.e., not necessarily contiguous sub-sequence, although the contiguous case is interesting too). In terms of tuples of character counts, this is equivalent to determining whether there is a tuple in the dictionary which dominates the query tuple, i.e. every count in the dictionary word's tuple is greater than or equal to the corresponding count in the query word's tuple.
In the hopes of getting a fast solution, let's say the problem is simply answering yes/no for the answer to the query, and if yes, returning just one possible dictionary word (count tuple) that satisfies the query.
Is there any kind of preprocessing that would take a reasonable amount of time/memory in terms of the dictionary size, such that these subsequence permutation questions could be answered more quickly than say, just by sorting multiple copies of the dictionary word data set, where each copy is sorted by occurrences of a particular character, and then the sorted list that gives the lowest number of members satisfying the query string for that character are linearly searched for a match?
I have a bad feeling that what I might be wanting is a potentially very high dimensional range tree (dimension is number of characters in the alphabet), so that range queries can be performed. However the range queries for this problem have a very special form so I'm hoping for something better, especially since for an alphabet of size d and dictionary of n words, the range tree approach would require O(n (log n)^(d-1)) preprocessing time and storage, and queries would take O((log n)^(d-1)) time. Depending on d, the range tree could easily have empirical query time exceeding brute force O(nw) query time for a dictionary of n words of length no more than w, and that brute force approach wouldn't even require any preprocessing.