0

I am writting a programme in mapreduce. I need to save a big value for each key. In detail for each id(key), I want to save a value that consists of large numbers. I used numbers from 1 to 100000000. for example:

id       value
1        1,3,9,23,56,345,.......,10000000000
2        6,8,45,321,876,.........,98760000876
.
.
.
100000000   1,2,6.83,90,126,567,.......,7632786765643

In each iteration the amount of numbers in each value increases. Firstly, I choose Text type for value, but in the results I saw that shuffle size became very big and I couldn't get answer. Then i choosed BitSet Type but the process of BitSet was very slow.I don't know which data structure, I can use that can provide me with size and process speed.Can anyone help? Thanks.

Ch Faizan Mustansar
  • 354
  • 3
  • 6
  • 18
ali abdoli
  • 33
  • 6
  • 1
    You can use [BigInteger](http://docs.oracle.com/javase/6/docs/api/java/math/BigInteger.html) – AurA Dec 03 '13 at 10:15
  • @AurA: Cn BigInteger save a lot of numbers? for each number, how much space does it need? Is it fast? – ali abdoli Dec 03 '13 at 10:27

2 Answers2

1

I think that you can associate a List for each key. So you can use a Map wich associates an ID to a List of numbers : Map<Integer, List<Long>>

Patrick
  • 831
  • 11
  • 25
  • Which one is better? Array of longs or list of longs? – ali abdoli Dec 03 '13 at 15:00
  • Using array in Java is a bad idea. You can use an `ArrayList` instead (random access in O(1)). But if you want to keep the order in your numbers, use a `LinkedList` instead. – Patrick Dec 03 '13 at 17:07
  • list is better or set? – ali abdoli Dec 03 '13 at 18:34
  • Why array is a bad idea? which advantages does ArrayList have that array doesn't? – ali abdoli Dec 03 '13 at 21:12
  • An array is a static structure : u can't change its size "on the fly" easily. An `ArrayList` is an object where if you add elements its size will grow automatically. Moreover, you have many methods on an `ArrayList` to manipulate it, like `contains()`, `remove()`, ... – Patrick Dec 04 '13 at 08:27
0

in Java the int data type is a 32-bit signed integer. It has a range of -2,147,483,648 to 2,147,483,647 which is not enough in your case. If you have a 64-bit machine, you can use 'int' type.

Otherwise, you can use a BigInteger

for me the appropriate data structure is a:

Map<Integer, List<BigInteger>>
Lahniep
  • 719
  • 1
  • 11
  • 30
  • Is biginteger better than bitset? is it fast? – ali abdoli Dec 03 '13 at 11:17
  • Regular operations on BigInteger (add, substract, multiply, divide) is of course slower than corresponding basic types, due to the large size it uses in memory. BigInteger are also immutable. So every operation will create a new instance, which is costful regarding the number of operations you have to do. – Lahniep Dec 03 '13 at 12:31
  • Due to the size of numbers you want to manipulate, I will consider moving from a 32-bit to a 64-bit machine where you could simply use 'int' to represent numbers from -9223372036854775808 to 9223372036854775807. – Lahniep Dec 03 '13 at 12:36