I have the following Haskell code, implementing a simple version of the "cat" unix command-line utility. Testing performance with "time" on a 400MB file, it's about 3x slower. (the exact script I am using to test it is below the code).
My questions are:
- Is this a valid test of performance?
- How can I make this program run faster?
- How can I identify performance bottlenecks in Haskell programs in general?
Regarding questions 2 and 3: I have used GHC -prof, then running with +RTS -p, but I'm finding the output a bit uninformative here.
Source (Main.hs)
module Main where
import System.IO
import System.Environment
import Data.ByteString as BS
import Control.Monad
-- Copied from cat source code
bufsize = 1024*128
go handle buf = do
hPut stdout buf
eof <- hIsEOF handle
unless eof $ do
buf <- hGetSome handle bufsize
go handle buf
main = do
file <- fmap Prelude.head getArgs
handle <- openFile file ReadMode
buf <- hGetSome handle bufsize
hSetBuffering stdin $ BlockBuffering (Just bufsize)
hSetBuffering stdout $ BlockBuffering (Just bufsize)
go handle buf
Timing script (run.sh):
#!/usr/bin/env bash
# Generate 10M lines of silly test data
yes aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | head -n 10000000 > huge
# Compile with optimisation
ghc -O2 Main.hs
# Run haskell
echo "timing Haskell"
time ./Main huge > /dev/null
echo ""
echo ""
# Run cat
echo "timing 'cat'"
time cat huge > /dev/null
My results:
timing Haskell
real 0m0.980s
user 0m0.296s
sys 0m0.684s
timing 'cat'
real 0m0.304s
user 0m0.001s
sys 0m0.302s
The profiling report when compiling with -prof and running with +RTS -p is below:
Sat Dec 13 21:26 2014 Time and Allocation Profiling Report (Final)
Main +RTS -p -RTS huge
total time = 0.92 secs (922 ticks @ 1000 us, 1 processor)
total alloc = 7,258,596,176 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 100.0 100.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 46 0 100.0 100.0 100.0 100.0
CAF GHC.Conc.Signal 84 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 81 0 0.0 0.0 0.0 0.0
CAF System.Posix.Internals 76 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 70 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 69 0 0.0 0.0 0.0 0.0