which datastructure for this hashmap scenario

Question

I have a scenario where i store values in a hashmap.

Keys are strings like

fruits
fruits_citrus_orange
fruits_citrus_lemon
fruits_fleshly_apple
fruits_fleshly
fruits_dry

and so on.

Values are some objects. Now for a given input say fruits_fleshly i need to retrieve all cases where it starts with "fruits_fleshly" In the above case I need to fetch

fruits_fleshly_apple
fruits_fleshly

One way to do this is by doing String.indexOf over all the keys. Is there any other effective way to do this instead of iterating over all the keys in a map

It looks like you want a trie. http://en.wikipedia.org/wiki/Trie. But I would use your simple solution if you have only a few keys. — JB Nizet, Aug 01 '13 at 18:45
This is not homework. I'm looking for better methods when my map size is huge. — John Eipe, Aug 01 '13 at 18:57
question, if a key of `frui` was passed, should it result in all `fruits` or are the keys limited to those actually inserted? If the latter I might have a thought. — John B, Aug 01 '13 at 19:04

Rohit Jain · Answer 1 · 2013-08-01T18:59:11.607

2

Iterating the map seems quite simple and straight-forward way of doing this. However, since you don't want to iterate over keys on your own, you can use Guava's Maps#filterEntries, if you are ok with using 3rd party library.

Here's how it would work:

Map<String, Object> = Maps.filterEntries(
                   yourMap, 
                   Predicate.containsPattern("^fruits_fleshly"));

But, that would too iterate over the map in the backyard. So, iteration is still there, if you are bothered about efficiency.

edited Aug 01 '13 at 18:59

answered Aug 01 '13 at 18:52

Rohit Jain

209,639
45
409
525

1

Just to be clear, love it, but OP should understand that it is iterating the keys behind the scenes or no performance difference than OP's proposal just a lot cleaner and doesn't re-invent the wheel. – John B Aug 01 '13 at 18:57
@JohnB. That's what my last statement says. – Rohit Jain Aug 01 '13 at 18:58
Is it faster than iteration? Is there a performance improvement – John Eipe Aug 01 '13 at 18:58
Sorry, must have missed it. @John, no. – John B Aug 01 '13 at 18:59
@John I can't comment on that, as I haven't benchmarked it. But I would guess there won't be much difference if any, as `filterEntries` would iterate over `keySet` behind the scenes. – Rohit Jain Aug 01 '13 at 19:07
@JohnB. As per one of the answers of [this post](http://stackoverflow.com/q/5719833/1679863) it seems like, this way would be efficient. You might take a look If you are concerned too much, just benchmark it yourself, and don't believe on what you see. – Rohit Jain Aug 01 '13 at 19:12

Ankit · Accepted Answer · 2013-08-01T19:12:17.373

2

though these are strings, but to me, it looks like these are certain categories & sub categories, like fruit, fruit-freshly, fruit-citrus etc..

If that is a case you can instead implement a Tree data-structure. This would be most effective for search operation.

since Tree has a parent-child structure, there is a root node & child node. You can have a structure like this:

(0)   (1)        (2)
fruit
|_____citrus
|          |_____lemon
|          |_____orange
|
|_____freshly
           |_____apple
           |_____

in this structure, say if you want to search for citrus fruit, you can just go to citrus, and list all its child. And finally you can construct full name by concatenating the name as a path from root to leaves.

edited Aug 01 '13 at 19:12

answered Aug 01 '13 at 18:57

Ankit

6,554
6
49
71

This might help. Thanks – John Eipe Aug 01 '13 at 19:16
Since I'm doing this on a webserver where new keys are created and fetched....the operations on the tree needs to be synchronized and handling it gets messy – John Eipe Aug 01 '13 at 19:22
@John not much of a problem. Since there will be more operations on leaves, than on any other node. Modification at leaf node will not have to be synchronized (though it depends on the use-case), but yes everything else needs to be. And for that matter, the opeartions on hashmap as well needs to be synchronized. – Ankit Aug 01 '13 at 19:26

score 1 · Answer 3 · answered Aug 02 '13 at 05:06

Since HashMap doesn't maintain any order for its keys it's not a very good choice for this problem. A better choice is the TreeMap: it has methods for retrieving a sub map for a range of keys. These methods run in O(log n) time (n number of entries) so it's better than iterating over the keys.

Map subMap = myMap.subMap("fruits_fleshly", true, "fruits_fleshly\uffff", true);

Michelle · Answer 4 · 2013-08-01T19:09:10.657

0

The nature of a hashmap means that there's no way to do a "like" comparison on keys - you have to iterate over them all to find where key.startsWith(input).

I suppose you could nest hashmaps and split up your keys. E.g.,

{
  "fruits":{
    "citrus":{
      "orange":(value), 
      "lemon":(value)
    }, 
    "fleshly":{
      "apple":(value), 
      "":(value)
    }
  }
}

...etc.

The performance implications are probably horrific on a small scale, but that may not matter in a homework context but maybe not so bad if you're dealing with a lot of data and only a couple layers of nesting.

Alternatively, create a Category object with a List of Categories (sub-categories) and a List of entries.

edited Aug 01 '13 at 19:09

answered Aug 01 '13 at 18:52

Michelle

2,830
26
33

This seems like an overly complex idea that would be very brittle. – John B Aug 01 '13 at 19:00
It would depend on the size of the map and the size and depth of sub-categories. With ~1 bajillion levels of nesting and 3 items at the bottom of each sub-category, yes. With ~1 bajillion entries over 2 or 3 layers deep, not quite as bad. Lack of context makes it difficult to understand what the shape of the data might be and how it's being used. – Michelle Aug 01 '13 at 19:06

score 0 · Answer 5 · answered Aug 02 '13 at 16:01

I believe Radix Trie is what you are looking for. It is similar idea as @ay89 solution.

You can just use this open source library Radix Trie example. It perform better than O(log(N)). You will be able to find a hashmap assigned to a key in average constant time (number of underscores in your search key string) with a decent implementation of Radix Trie.fruits fruits_citrus_orange fruits_citrus_lemon fruits_fleshly_apple fruits_fleshly fruits_dry

Trie<String, Map> trie = new PatriciaTrie<>;
trie.put("fruits", hashmap1);
trie.put("fruits_citrus_orange", hashmap2);
trie.put("fruits_citrus_lemon", hashmap3);
trie.put("fruits_fleshly_apple", hashmap4);
trie.put("fruits_fleshly", hashmap5);

Map.Entry<String, Map> entry = trie.select("fruits_fleshy");

If you just want one hashmap to be return by select you might be able to get slightly better performance if you implement your own Radix Trie.

which datastructure for this hashmap scenario

5 Answers5