4

I'd like to write one code that could be run in two "modes":

  • either in logging mode, i.e. it should log some informations (in my case I want to log the number of calls done on some particular functions at a given time)
  • or in efficient mode, i.e. it does not log anything but just runs as fast as possible

I tried to write the following code, which creates two Writers, one normal one (for the logging mode) and one stupid one (that does not record anything, for the efficient mode). I then define a new class LogFunctionCalls that allows me to run my function in one of these two Writers.

However, I tried to compare the speed of the code using the Stupid writer, and it's significantly slower than the normal code without writer: here is the profiling informations:

  • code without writer: total time = 0.27s, total alloc = 55,800 bytes
  • code with stupid writer StupidLogEntry: total time = 0.74 s, total alloc = 600,060,408 bytes (NB: the real time is much bigger than 0.74s...)
  • code with real writer LogEntry: total time = 5.03 s, total alloc = 1,920,060,624 bytes

Here is the code (you can comment depending on which run you want to use):

{-# LANGUAGE ScopedTypeVariables #-}
module Main where

--- It depends on the transformers, containers, and base packages.

--- You can profile it with:
--- $ cabal v2-run --enable-profiling debug -- +RTS -p
--- and a file debug.prof will be created.

import qualified Data.Map.Strict as MapStrict
import qualified Data.Map.Merge.Strict as MapMerge

import qualified Control.Monad as CM
import Control.Monad.Trans.Writer.Strict (Writer)
import qualified Control.Monad.Trans.Writer.Strict as Wr
import qualified Data.Time as Time

-- Test using writer monad

-- The actual LogEntry, that should associate a number
-- to each name
newtype LogEntry = LogEntry { logMap:: MapStrict.Map String Int }
  deriving (Eq, Show)

-- A logentry that does not record anything, always empty
newtype StupidLogEntry = StupidLogEntry { stupidLogMap:: MapStrict.Map String Int }
  deriving (Eq, Show)

-- Create the Monoid instances
instance Semigroup LogEntry where
  (LogEntry m1) <> (LogEntry m2) =
    LogEntry $ MapStrict.unionWith (+) m1 m2
instance Monoid LogEntry where
  mempty = LogEntry MapStrict.empty

instance Semigroup StupidLogEntry where
  (StupidLogEntry m1) <> (StupidLogEntry m2) =
    StupidLogEntry $ m1
instance Monoid StupidLogEntry where
  mempty = StupidLogEntry MapStrict.empty

-- Create a class that allows me to use the function "myTell"
-- that adds a number in the writer (either the LogEntry
-- or StupidLogEntry one)
class (Monoid r) => LogFunctionCalls r where
  myTell :: String -> Int -> Writer r ()

instance LogFunctionCalls LogEntry where
  myTell namefunction n = do
    Wr.tell $ LogEntry $ MapStrict.singleton namefunction n

instance LogFunctionCalls StupidLogEntry where
  myTell namefunction n = do
    -- Wr.tell $ StupidLogEntry $ Map.singleton namefunction n
    return ()

-- Function in itself, with writers
countNumberCalls :: (LogFunctionCalls r) => Int -> Writer r Int
countNumberCalls 0 = return 0
countNumberCalls n = do
  myTell "countNumberCalls" 1
  x <- countNumberCalls $ n - 1
  return $ 1 + x

--- Without any writer, pretty efficient
countNumberCallsNoWriter :: Int -> Int
countNumberCallsNoWriter 0 = 0
countNumberCallsNoWriter n = 1 + countNumberCallsNoWriter (n-1)

main :: IO ()
main = do
  putStrLn $ "Hello"
  -- Version without any writter
  print =<< Time.getZonedTime
  let n = countNumberCallsNoWriter 15000000
  putStrLn $ "Without any writer, the result is " ++ (show n)
  -- Version with Logger
  print =<< Time.getZonedTime
  let (n, log :: LogEntry) = Wr.runWriter $ countNumberCalls 15000000
  putStrLn $ "The result is " ++ (show n)
  putStrLn $ "With the logger, the number of calls is " ++ (show $ (logMap log))
  -- Version with the stupid logger
  print =<< Time.getZonedTime
  let (n, log :: StupidLogEntry) = Wr.runWriter $ countNumberCalls 15000000
  putStrLn $ "The result is " ++ (show n)
  putStrLn $ "With the stupid logger, the number of calls is " ++ (show $ (stupidLogMap log))
  print =<< Time.getZonedTime  
tobiasBora
  • 1,542
  • 14
  • 23
  • 4
    Have you tried benchmarking with [`criterion`](http://www.serpentine.com/criterion/)? That’s usually considered to be the most accurate way of timing a Haskell program. – bradrn May 06 '20 at 13:10
  • 2
    I suspect your test case is too simple. If GHC has managed to prove that `countNumberCallsNoWriter ≡ id` then it's obvious why this one runs so fast. – leftaroundabout May 06 '20 at 13:12
  • ...but no, GHC doesn't seem to simplify it to _O_ (1). – leftaroundabout May 06 '20 at 13:18
  • Also, it helps neither to pepper the code with `{-# INLINE #-}` and `{-# SPECIALIZE #-}`, nor to remove the Map altogether from `StupidLogEntry`. – leftaroundabout May 06 '20 at 13:31
  • `Writer` is terrible for logging any computation of more than a few steps. The accumulator is lazy, it's added to in each bind operation, and unevaluated thunks accumulate in memory. Better use a strict `State` monad with the strict `modify'` function . – danidiaz May 06 '20 at 13:40
  • 1
    Relating to what @danidiaz points out, you might want to (1) use `return $! 1 + x` instead of `return $ 1 + x` in `countNumberCalls`, so that thunks don't accumulate behind the `return`; and (2) Use the `WriterT` from `Control.Monad.Trans.Writer.CPS`, whose accumulator is actually strict (note that *mtl* doesn't have a `MonadWriter` instance for it yet, though [that will be fixed in the next *mtl* release](https://github.com/haskell/mtl/issues/69)). If nothing else, those changes should make the full logging version run in constant memory. – duplode May 06 '20 at 13:53
  • You should probably turn profiling *off*, since profiling can significantly harm optimization, making the measurements inaccurate. Without profiling, you still have `+RTS -s`, which still produces the stats you posted, except more realistic. – HTNW May 06 '20 at 13:54
  • 4
    Perhaps the typeclass should be defined in the monad (say, `CounterMonad`) and not in the accumulator. The do-nothing version would simply be an instance for `Identity` that threw away the log messages. – danidiaz May 06 '20 at 14:12
  • Thanks everybody! So I tried indeed lot's of options, the `Trans.Writer.Strict` is less efficient than the `Trans.Writer.CPS`, which is itself less efficient than the `State.Lazy` (even with `modify'`), itself less efficient than the `State.Strict`. Disabling profiling makes a huge difference indeed (it seems that with profiling the optimization does not happens). But then how am I supposed to detect "lazy leaks"? And I just realized that indeed, turning `$ 1 + x` into `$! 1 + x` makes a huge difference. How come Haskell do not optimize it? And how can I detect such leaks? – tobiasBora May 07 '20 at 19:51
  • @bradrn Criterion looks great, but can you use it to detect "lazy memory leaks"? It seems that it's basically doing a benchmark based on time right? If you have more info I asked a related question here https://stackoverflow.com/questions/61666819/haskell-how-to-detect-lazy-memory-leaks – tobiasBora May 07 '20 at 20:09
  • @tobiasBora As far as I’m aware, Criterion is only for timing — I wouldn’t know how to do other forms of benchmarks. – bradrn May 08 '20 at 00:26

1 Answers1

6

The Writer monad is the bottleneck. A better way to generalize your code so it can run in those two "modes" is to change the interface, i.e., the LogFunctionCalls class, to be parameterized by the monad:

class Monad m => LogFunctionCalls m where
  myTell :: String -> Int -> m ()

Then we can use an identity monad (or monad transformer) to implement it trivially:

newtype NoLog a = NoLog a
  deriving (Functor, Applicative, Monad) via Identity

instance LogFunctionCalls NoLog where
  myTell _ _ = pure ()

Note also that the function to test has a different type now, that no longer refers to Writer explicitly:

countNumberCalls :: (LogFunctionCalls m) => Int -> m Int

Let's stick it in a benchmark, which has all kinds of methodological issues as pointed out in the comments, but still, something interesting happens if we compile it with ghc -O:

main :: IO ()
main = do
  let iternumber = 1500000
  putStrLn $ "Hello"
  t0 <- Time.getCurrentTime

  -- Non-monadic version
  let n = countNumberCallsNoWriter iternumber
  putStrLn $ "Without any writer, the result is " ++ (show n)
  t1 <- Time.getCurrentTime
  print (Time.diffUTCTime t1 t0)

  -- NoLog version
  let n = unNoLog $ countNumberCalls iternumber
  putStrLn $ "The result is " ++ (show n)
  t2 <- Time.getCurrentTime
  print (Time.diffUTCTime t2 t1)

The output:

Hello
Without any writer, the result is 1500000
0.022030957s
The result is 1500000
0.000081533s

As we can see, the second version (the one we care about) took zero time. If we remove the first version from the benchmark, then the remaining one will take the 0.022s of the former.

So GHC actually optimized one of the two benchmarks away because it saw that they are the same, which achieves what we originally wanted: the "logging" code runs as fast as specialized code without logging because they're literally the same, and the benchmark numbers don't matter.

This can also be confirmed by looking at the generated Core; run ghc -O -ddump-simpl -ddump-to-file -dsuppres-all and make sense of the file Main.dump-simpl. Or use inspection-testing.

Compilable gist: https://gist.github.com/Lysxia/2f98c4a8a61034dcc614de5e95d7d5f8

Li-yao Xia
  • 31,896
  • 2
  • 33
  • 56
  • 1
    Checking that both are optimized to the same code sounds like a great application for http://hackage.haskell.org/package/inspection-testing. But I’d be cautious: GHC may specialize `countNumberCalls` to `NoLog` in this example (small code, all in one module); that does not mean it will happen in a big code base (without careful use of `{-# SPECIALIZE … #-}`). – Joachim Breitner May 06 '20 at 18:08
  • Thanks a lot, this works great! The content of `dump-simpl` is pretty hard to read, but thanks for the trick! – tobiasBora May 07 '20 at 19:44