11

Here's a strange behavior I fell into and I can't find any hint on why it's like this. I use in this example the estimate method of SizeEstimator from Spark but I haven't found any glitch in their code so I wonder why - if they provide a good estimation of memory - why I have this:

val buf1 = new ArrayBuffer[(Int,Double)]
var i = 0
while (i < 3) {
   buf1 += ((i,i.toDouble))
   i += 1
}
System.out.println(s"Raw size with doubles: ${SizeEstimator.estimate(buf1)}")
val ite1 = buf1.toIterator
var size1: Long = 0l
while (ite1.hasNext) {
   val cur = ite1.next()
   size1 += SizeEstimator.estimate(cur)
}
System.out.println(s"Size with doubles: $size1")

val buf2 = new ArrayBuffer[(Int,Float)]
i = 0
while (i < 3) {
   buf2 += ((i,i.toFloat))
   i += 1
}
System.out.println(s"Raw size with floats: ${SizeEstimator.estimate(buf2)}")
val ite2 = buf2.toIterator
var size2: Long = 0l
while (ite2.hasNext) {
   val cur = ite2.next()
   size2 += SizeEstimator.estimate(cur)
 }
 System.out.println(s"Size with floats: $size2")

The console output prints:

Raw size with doubles: 200
Size with doubles: 96
Raw size with floats: 272
Size with floats: 168

So my question's quite naive: why do floats tend to take more memory than doubles in this case? And why does it get even worse when I transform it into an iterator (first case, there's a 75% ratio which becomes a 50% ratio when transforming into an iterator!).

(To have more context, I fell into this when trying to "optimize" a Spark application by changing Double to Float and found out that it actually took more memory having floats than doubles...)

P.S.: it's not due to the small size of buffers (here 3), if I put 100 instead I get:

Raw size with doubles: 3752
Size with doubles: 3200
Raw size with floats: 6152
Size with floats: 5600

and floats still consume more memory... But the ratio have stabilized, so it seems that the different ratios in transformation to iterator must be due to some overhead I guess.

EDIT: It seems that Product2 is actually only specialized on Int, Long and Double:

trait Product2[@specialized(Int, Long, Double) +T1, @specialized(Int, Long, Double) +T2] extends Any with Product

Do anyone know why Float is not taken into account? Neither Short which leads to weird behaviors...

Vince.Bdn
  • 1,145
  • 1
  • 13
  • 28
  • sorry I didn't see the update before posting the anwer. If you wish, I can delete the answer – Odomontois Feb 24 '16 at 09:42
  • No your answer's great because you provided with a link that explained why it's not specialized on all primitives! It's due to the combinatorial number it would lead... which actually makes sense =) It's good to know though before trying to optimize stupidely like I tried! – Vince.Bdn Feb 24 '16 at 09:45

1 Answers1

13

This is because Tuple2 is @specialized for Double but not specialized for Float.

That means (Int,Double) will be presented as structure with 2 fields of primitive java types int and double, while (Int,Float) will be presented as structure with int and wrapper type java.lang.Float fields

More discussion here

Odomontois
  • 15,918
  • 2
  • 36
  • 71
  • There's something weird in your link, they tell it's because they don't want to have too many specializations. But when you look at the code, Product3 is not even specialized... So it's just Product1 and Prodcut2... They could easily have added up a few specialization on usual types as floats and shorts! – Vince.Bdn Feb 24 '16 at 09:54
  • 1
    @Vince.Bdn `Tuple2` is used way often than `Tuple3`. So I guess they decided further definitions doesn't worth library jar size. You can just use case classes for effective store, [miniboxing](http://scala-miniboxing.org) for efficient access and [shapeless](https://github.com/milessabin/shapeless) for efficient generic transformations – Odomontois Feb 24 '16 at 10:11