4

my application reads bigram collocation (pairs) from a .txt file. they are to be read as key-value pairs. a single key can have multiple values (So, any kind of a Map as a data structure is ruled out)...I want to keep them sorted, in natural alphabetical order..

first word of collocation i.e. key will be a verb and its value will contribute to a verb-word kind of a collocation..So, trees can be consideration

So, essentially I am trying to implement a

SortedList <String, String> 

kind of a thing..

I have come across following data structures that suit my requirement, although I am unable to decide which one to use: (the MultiMap mentioned here are a part of google's collections framework)

  1. HashMultiMap

  2. Tries - i know only the basics of this data structure. I found one implementation of it in Java here . It does not implement delete() operation.

  3. FastTreeMap

  4. TreeMultimap

  5. SortedSetMultimap

or any other data structure you would like to recommend? I havent gone through the Dictionary in Java yet...Please help me decide which one should I choose...

Thanks!

EDIT - the list is expected to contain about 100-200 entries

EDIT2: Operations: searching if a key-value mapping exists for a given key..as i said before, the dst will store a list of verb-word pairings as key-value entries; it is initialized by reading entries from a file...the working goes something like this: we first get all keys from the dst...read a file and tokenize it (done thru OpenNLP, dst not for this)..and then search if the any of the tokens macthes a key (i.e. is a verb) in the dst......once found, we get all values for the given key, and search the next token within the set of values...if the value is also found in the dst, it means a collocation is detected..appropriate values are set then...THIS IS HOW THE DST SHOULD ACTUALLY WORK...

Navin Israni
  • 1,327
  • 3
  • 15
  • 27
  • 2
    `Map>` would work and would be the simplest – Dan D. Mar 27 '11 at 10:48
  • 1
    What kind of operations do you want to perform? How many entries will this collection include? How often do you plan to perform which operation? These are the questions you have to answer so that somebody is able to recommend a suitable data structure. – jmg Mar 27 '11 at 11:08
  • 1
    I did not see your edit about the number of entries. With such small numbers, I'd say the concrete data structure is not as relevant as I thought before. So, your question is more about: Which ready-made library fits to your use case? Is that right? – jmg Mar 27 '11 at 11:18
  • What should be kept sorted? Keys? Values? both? – akappa Mar 27 '11 at 11:45
  • @jmg well i had decided on simply using 2D String arrays..my friend suggested to use HashMap..i figured out it didn't fit my case coz i was looking for a multiple-value-for-each-key data structure..now i am looking which one of the ones i asked in my question should I use?? I might think og going back to arrays, if I dont find a good dst here.. – Navin Israni Mar 27 '11 at 15:23
  • @akappa yes keys and values both are to be kept sorted... – Navin Israni Mar 27 '11 at 15:24
  • @DanD i used your option...instantiated with TreeMap> ...:D :D :D thanks to u too..:) – Navin Israni Mar 28 '11 at 13:44

3 Answers3

2

java.util.NavigableMap is an interface providing a map abstraction with a total ordering of the keys. JavaSE 6 provides java.util.TreeMap or java.util.concurrent.ConcurrentSkipListMap as implementations. The former is probably sufficient for you. To be clear I'd recommend using something like:

Map<String,Set<String>> with the following concrete type TreeMap<String, ArraySet<String>>.

jmg
  • 7,308
  • 1
  • 18
  • 22
2

Not a HashMap or HashMultiMap because they don't allow you to iterate the keys in order.

Not FastTreeMap or ConcurrentSkipListMap ... unless your application is multi-threaded.

The various TreeMap or TreeMultiMap implementations are OK, though the TreeMap versions will entail you instantiating them as Map<String,List<String>> and manage the lists.

Tree versus Trie is a bit difficult. I suspect that a well designed / implemented Trie would give faster lookup, but I also suspect that it would take more memory. (I'm making some assumptions. In reality, the complexity analysis would depend on details of the trie implementation.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • 1
    We're talking about 200-300 entries so, unless this algorithm is intended to run on an Amiga it shouldn't matter at all – akappa Mar 27 '11 at 11:46
  • If performance (and space utilization) are not an issue, then it really doesn't matter whether you use TreeMap, TreeMultiMap or a trie-based alternative ... so long as the respective implementation classes work. The best solution is the simplest one. – Stephen C Mar 28 '11 at 05:44
  • yes...i could have really used any DST alternative..but the reason i asked this question was that i wasn't aware of nesting of List / Set inside a Map..coz I am a newbie with these DSTs..and this is really my first time implementing them...@jmg gave me this nesting option.. – Navin Israni Mar 28 '11 at 13:10
1

FYI: The Google Collections project has been discontinued and is now part of Google's Guava.

Guava's ListMultimap will ensure that the values within a particular key remain in the same order as they appeared in the file. It won't, however, keep the keys in the same order as they appeared in the file.

Adam Paynter
  • 46,244
  • 33
  • 149
  • 164