The class DenseMatrix has a parameter data of type Array[V]. Why not using some other mutable collection that can grow dynamically, such as Vector?
-
1just guessing: Probably because communication with native libraries (JNI) has to be done with Arrays, AFAIK you cannot pass arbitrary collections to native code. – Raphael Roth Sep 13 '16 at 14:14
-
Probably also because Arrays are fastest. And why would a matrix need to grow dynamically? Doesn't a matrix always have fixed dimensions? – Jasper-M Sep 13 '16 at 14:43
-
@Raphael Roth Thank you for the reply, can you point to some method of DenseMatrix that communicates with JNI? – Hari Bage Sep 13 '16 at 15:21
-
@Jasper-M You cannot say Arrays are faster in all operations. For example how do you append a column to a matrix that is represented by an Array? You must create a new one, so it would take linear time of the number of elements. If it was a vector, then append would take a constant time (linear of the number of elements in one column). – Hari Bage Sep 13 '16 at 15:27
1 Answers
The comments (from Raphael Roth and Jasper-M) both make good points and they're both part of the reason. Breeze uses netlib-java for handling its interface to native BLAS via JNI, and it uses arrays. (It could have been implemented in terms of Java Buffers, but they didn't.) Dynamically resizing DenseMatrices doesn't make a lot of sense, and the implementations of DM and DV are deliberately similar.
Arrays also have way better performance characteristics than any other built-in collection-like thing in Java and in Scala, and since Breeze cares about being fast, it's the best choice. All generic collections in both scala and java box primitive elements, which is totally unacceptable in performance sensitive environments. (I could have rolled my own specialized ArrayBuffer like thing using Scala's specialized, but I didn't.) Also, java Vector synchronizes all accesses and so it is particularly unacceptable unless you actually need the locking.
You can use VectorBuilder (which has a settable length parameter and can be set to -1 to turn off bounds checking) if you're unsure of the dimensionality of your data set.

- 2,257
- 11
- 23
-
Thank you for the detailed answer. I checked what you were explaining, now I understand the reason of choosing Array (though not very familiar with netlib-java. The library appears to be even more awesome than I thought. – Hari Bage Sep 13 '16 at 16:48