14

It's interesting that Data.Array.Repa is actually faster than hmatrix, which is unexpected since hmatrix is implemented using LAPACK. Is this because Repa uses the unboxed type?

import Data.Array.Repa
import Data.Array.Repa.Algorithms.Matrix

main = do
    let
        a = fromListUnboxed (Z:.1000:.1000::DIM2) $ replicate (1000*1000) 1.0 :: Array U DIM2 Double
        b = fromListUnboxed (Z:.1000:.1000::DIM2) $ replicate (1000*1000) 1.0 :: Array U DIM2 Double
    m <- (a `mmultP` b)
    print $ m!(Z:.900:.900)

running time with 1 core: 7.011s
running time with 2 core: 3.975s

import Numeric.LinearAlgebra
import Numeric.LinearAlgebra.LAPACK

main = do
    let
        a = (1000><1000) $ replicate (1000*1000) 1.0
        b = (1000><1000) $ replicate (1000*1000) 1.0
    print $ (a `multiplyR` b) @@> (900,900)

Running time: 20.714s

kai
  • 235
  • 1
  • 8
  • 9
    Have you tried building a proper microbenchmark with Criterion? It seems like measuring construction and multiplication and IO for printing all in one go is bound to get lots of noise and results that don't connect to any real use case. – Thomas M. DuBuisson Nov 01 '13 at 19:52
  • Also, Reps is parallel and LAPACK might not be in this case – MFlamer Nov 01 '13 at 21:13
  • @ThomasM.DuBuisson: Good suggestion! However, I don't think construction would take too much time in this case, and since I only printed one element in the matrix, IO is not a big deal either. I'll try Criterion when I have time. – kai Nov 01 '13 at 22:43
  • I'm not too familiar with `repa`, but is it possible that `repa` is being lazy and not computing the entire matrix? – Gabriella Gonzalez Nov 02 '13 at 14:57
  • @GabrielGonzalez I don't think so here, because the benchmark uses _unboxed_ repa arrays (`fromListUnboxed` and also the `U` parameter to Array). I think laziness would have kicked in with `Delayed` arrays (`D` parameter). – Alp Mestanogullari Nov 03 '13 at 19:26

1 Answers1

7

Perhaps you are using a non-optimized LAPACK library. In my computer, using libatlas-base, the running time is ~0.4s.

$ cat matrixproduct.hs

import Numeric.LinearAlgebra

main = do
    let a = (1000><1000) $ replicate (1000*1000) (1::Double)
        b = konst 1 (1000,1000)
    print $ a@@>(100,100)
    print $ b@@>(100,100)
    print $ (a <> b) @@> (900,900)

$ ghc matrixproduct.hs -O

$ time ./matrixproduct

1.0
1.0
1000.0

real    0m0.331s
user    0m0.512s
sys     0m0.016s
Alberto Ruiz
  • 411
  • 2
  • 2
  • You are right! That solves the problem. But I can't find libatlas-base on Centos, while there is one for Ubuntu. – kai Nov 04 '13 at 21:47
  • Do you need to do anything to tell hmatrix to use the optimized libraries? I've installed libatlas-base and rebuilt hmatrix and my code, and see no difference in perf. – JP Moresmau May 30 '15 at 13:39