0

I'm working on an app that takes voice input, and matches that input to known items in a manifest.

Each item in the manifest has a list of aliases, so that items with long titles can be matched to shorter names.

For example:

class Product
{
  itemname: "Old Stinky's Western Kentucky Big Rig Polish",
  aliases: ["old stinky", "other alias"]
}

And then loaded into memory as:

public List<Product> Collection;
Collection.Add(alltheproducts);

And then matched through:

public String isProductOrAlias(String lowertext) 
for (Product p: products.Collection) {
    if(lowertext.equals(p.name.toLowerCase()))
        return p.name;
    if(p.aliases != null) {
        for (String s: p.aliases) {
            if(lowertext.equals(s.toLowerCase()))
                return p.name;
        }
    }
}

This is working great on a test batch of twenty five items in the prototype, but eventually it will need to handle 5,000-10,000 items in as close to real time as possible on a phone.

Core Question:

Assuming I can hold 10,000 of these items in memory (the sample clocks in about 64 kilobytes, so less than a megabyte in total for 10,000 items), what is the best collection to use on android to store these objects in memory, and what is the fastest way to populate that object with data, and then find matching elements?

Wesley
  • 5,381
  • 9
  • 42
  • 65
  • Related: [Fastest way to check if a List contains a unique String](https://stackoverflow.com/q/3307549/295004) – Morrison Chang Sep 28 '18 at 22:26
  • @MorrisonChang I've seen a number of posts referencing Trie or HashSet or even Set, but they all seem to be for flat lists of strings. With the hierarchical structure here, I would expect some sort of lambda expression would match the fastest, but I'm not sure what that looks like. – Wesley Sep 28 '18 at 22:30
  • 1
    Without more information about your dataset and how it is being searched (i.e. why not aliases first as input is from voice), I'm not sure what else to say other than try a few, test and then check & measure the performance in as near as production environment as possible. Fast execution typically means some other trade-off in memory (indexes) or preprocessed time (i.e. build the trie offline and load in on app startup). – Morrison Chang Sep 28 '18 at 23:31
  • Lambda expressions can do nothing for speed as they're mostly a syntactic feature. What you need is a `Map` for fast lookup. – maaartinus Sep 29 '18 at 12:54

1 Answers1

1

You can easily do this with a Map assuming no duplicate aliases or product names. Kotlin version is:

data class Product(val name: String, val aliases: Array<String>)

fun test() {
    val products = listOf<Product>( ... )

    // Do this once, create a map from name and aliases to the product
    val productIndex = products.asSequence().flatMap { product ->
        val allKeys = sequenceOf(product.name) + product.aliases
        allKeys.map { it.toLowerCase() to product }
    }.toMap() 
    // now name => product and each alias => product are mapped in productIndex

    val matchingProduct = productIndex["something"] // search lower case name

    println(matchingProduct?.name ?: "<not found>")
}

A Trie does not make sense unless you are doing prefix matches. A Set makes no sense because you can only tell "does it exist" and not "which thing is it that matches". A Map will go from anything to the original Product from which you can get the name.

Also, your brute-force matching algorithm re-written in Kotlin is in an answer to your other question: https://stackoverflow.com/a/52565549/3679676

Jayson Minard
  • 84,842
  • 38
  • 184
  • 227