5

I'm doing some heavy processing (building inverse indices) using ints/ longs in Java.

I've determined that (un)boxing of standard java.collections maps takes a big portion of the total processing time. (as compared to a similiar implementation using arrays, which I can't use due to memory constraints).

I'm looking for a fast 3rd-party implementation (or any implementation at all for that matter) that could support the following structure:

Map with characteristics:

-keys in the map are sparse (+/- 10.000.000 keys in range [0,2^64] -values are always appended to the end of the list -fast insert (amortized O(1) if possible) -fast iteration in key-order.

I've looked at trove, fastutil, etc. but couldn't find a multimap implementation using primitives (only normal maps)

any help is appreciated.

Thanks, Geert-Jan

Geert-Jan
  • 18,623
  • 16
  • 75
  • 137
  • structure got lost: Map – Geert-Jan Nov 12 '09 at 13:21
  • What kind of API would you expect to have to the values of any given key? Or would you be merely doing contains(key, value) queries? – Tuure Laurinolli Nov 12 '09 at 13:24
  • I think amortized O(1) insertion and fast iteration in key-order contradict each other, or you need a hash that maintains key order, which would be bad if your hash table is smaller than key range. – Tuure Laurinolli Nov 12 '09 at 13:32
  • Although if keys are appended in-order, a link structure could be used for fast in-order iteration. – Tuure Laurinolli Nov 12 '09 at 13:33
  • Tuure: I would not even need contains(key,value) Basically I'm building a structure for streaming values to disk. I would iterate the keys in key-order, and grab all values that are contained in the key and stream them out to disk in order. – Geert-Jan Nov 12 '09 at 14:15
  • Tuure: keys are not guarenteed to be appended in order. I was thinking of keeping a seperate list of keys, order that list so keys are in order, and iterate that list the keys in the map. Of course, this will be a bit slower, but that way i can loose the need for iterating the map in key-order.. – Geert-Jan Nov 12 '09 at 14:18

3 Answers3

1

Have you considered implementing the multi-portion yourself using a primitive long -> Object-map and primitive int-set as the value?

Tuure Laurinolli
  • 4,016
  • 1
  • 22
  • 21
  • yeah I was thinking about that..Never done something like this. What would be the overhead against having a primitive long --> long map? I'm asking because I'm thinking of the alternative of decoding {int} as 1 long (using some bit-manipulation, thereby eliminating the 'multi'-part. This alterative would however require me to do a lookup of the value each time I insert / append a new value so the new decoded value can be calculated.. (hope that makes sense) From the top what would you say would be more performant? – Geert-Jan Nov 12 '09 at 14:24
  • I honestly don't know. Perhaps you should do a quick test? – Tuure Laurinolli Nov 12 '09 at 15:58
  • yeah. I've gone with my 2nd approach, which turns out to be lighning fast. (Haven't compared though) I'm flagging your answer since (in combination with the comments) it was the most helpful. Thanks. – Geert-Jan Nov 12 '09 at 16:15
0

What about Google collections library? http://code.google.com/p/google-collections/

Dmitry
  • 3,740
  • 15
  • 17
0

Depending on cardinality can use specific types of object Primitive Int/Long To where value:

  • if (size == 1) => Long (can dedup if have huge number of duplicates);

  • if (size <= 13) => LogSet (16 elements in array);

  • if (size > 13) => SparceLongBitSet. using e.g. 16 long as payload per block (can even reuse array)

for int can consider 26 as desision point. If performance is very important do benchmarking e.g. SparseLongBitSet only with specific sharding/block sizing. For memory locality consider reusing same memory blocks (e.g. arrays of 2M).

Last drop: Insted of Object consider useing index to payload (e.g. offheap pointer) and use static methods (Flightweith like) to operate on payload.