Efficient output of numbers

Question

I want to print a list of integrals separated with spaces to stdout. The list generation is fast, so I tried to solve this problem with the sequence [1..200000].

In C, I can implement it like this:

#include "stdio.h"
int main()
{
  int i;
  for(i = 0; i <= 200000; ++i)
    printf("%d ", i);
  return 0;
}

The fastest solution in Haskell I could implement is about three times slower:

import Data.List (intercalate)
main = putStr . intercalate " " . map (show) $ [1..(200000)]

I tried ByteStrings in some ways, but with them it got even slower. The big problems seems to be the conversion of the integers to strings with show (or the conversion to ByteStrings).

Any suggestions how to speed this up without interfacing to C? It should not become to complicated (as short and beautiful as possible, using other Haskell modules is fine).

Have you checked the speed of writing each element at a time? Writing to console is generally slow so this might be a lot worse, but it may be worth a try. You could try building up a chunk of a certain number of elements rather than a huge string, however you might need to use append (++) or a Hughes list (DList) to do this which is adding extra work. That's why I'm guessing that writing each element could still be competitive. — stephen tetley, Nov 26 '10 at 16:17
I tried a version that wrote a number at a time (as I thought the same), but it was slower. — Neil Brown, Nov 26 '10 at 16:24
Please remember that GHC is doing conversion to whatever coding your console is using by default. That may and will cause some additional overhead. — Tener, Nov 26 '10 at 22:18
Ok, turns out that turning off encoding stuff doesn't help much: in my case it was 0.80s with vs. 0.76 without. — Tener, Nov 26 '10 at 22:26

score 4 · Answer 1 · answered Nov 26 '10 at 15:09

Well, you could rewrite the code a bit:

import Data.List (intercalate)
main = output
output = putStr one_string
one_string = intercalate " " strings
strings = map show $ [1..2000000]

Then you could profile it using "ghc -O2 -prof -auto-all .hs":

COST CENTRE                    MODULE               %time %alloc

one_string                     Main                  42.2   55.9
strings                        Main                  39.2   43.1
output                         Main                  18.6    1.0

You can see that intercalate takes a good half of the runtime. I don't think that you could make the whole thing go faster, though, without resorting to some low-level trickery. If you switch to faster intercalate (from Data.ByteString.Lazy.Char8, for example), you would have to use a slower variant of Int -> String conversion.

I'm not sure I'd trust that intercalate is actually taking half the runtime here. The values from `strings` are just thunks until they're forced by intercalate, and I think that means the cost of all the `show` s will be charged to the intercalate call. Possibly. — John L, Nov 26 '10 at 17:34

John L · Answer 2 · 2010-11-26T17:31:09.833

This program runs much faster if I use ghc-6.10.4 instead of ghc-6.12.1. IIRC the 6.12 line introduced unicode-aware IO, which I think accounts for a lot of the slowdown.

My system:

C  (gcc -O2):        0.141s
HS (ghc-6.10.4 -O2): 0.191s (ave.)
HS (ghc-6.12.1 -O2): 0.303 (ave.)

When using ghc-6.10 the result is pretty comparable to C; I think the difference there is due to Haskell's use of strings (and probably runtime overhead too).

I think it's possible to bypass the unicode conversion in ghc-6.12's I/O if you want to get better performance from that compiler.

fuz · Answer 3 · 2010-11-26T14:52:30.217

1

First question:

Post some code!!!

I guess (according to delnan :), that it's slow because the following happens (skip step 4if you don't use bytestring):

All the numbers are one by one converted. The output is a list.
The list of outputs have to be traversed again because you add elements (the spaces!)
The list have to be traversed again because you concat it
The list has to be traversed again because it is converted to bytestring (pack)
The whole thing is printed out.

It could be faster with bytestring, but you should probably implement your own show, which works with bytestrings. Then, be so smart and avoid multiple traversion, input the whitespaces once the list is created.

Maybe like this:

import qualified Data.Bytestring.Lazy.Char8 as B

showB :: Int -> Bytestring -- Left as an exercise to the reader

main = B.putStr $ pipeline [0..20000] where
  pipeline = B.tail . B.concat . map (B.cons' ' ') . map showB

This is untested, so profile!!! You see, the to maps can be fused, so the list will be traversed maybe two times.

edited Nov 26 '10 at 14:52

answered Nov 26 '10 at 14:25

fuz

88,405
25
200
352

I know, that there will happen some magic because of the laziness, but what I mean is, that the operations on the list can't be fused in his case. But maybe there's another reason. – fuz Nov 26 '10 at 14:29
main doesn't type-check. I think you meant B.putStr $ pipeline [0..20000] .Also there is no need for two calls to map in pipeline, one suffices. – edon Nov 26 '10 at 14:48
@ednedn: Fixed. I had no time to test this. GHC has a rule which turns two consecutive maps into one. It's easier to read like it is now. – fuz Nov 26 '10 at 14:58
This is wrong. These list processing functions will fuse (ghc uses build/foldr fusion), so the loop isn't traversed multiple times. Check the core by compiling with -ddump-simpl and you'll see this. – John L Nov 27 '10 at 00:26

score 0 · Answer 4 · answered Nov 26 '10 at 16:33

Here is a different approach to the same problem, that attempts to exploit the sharing in the string suffixes. It went about 1/3rd faster on my machine than the original Haskell, although admittedly still a way off the C version. Doing numbers other than 1 to 999999 is left as an exercise:

basic :: [Char]
basic = ['0'..'9']

strip :: String -> String
strip = (' ' :) . dropWhile (== '0')

numbers :: Int -> [String]
numbers 0 = [""]
numbers n = [x : xs | x <- basic, xs <- rest]
  where
    rest = numbers (n - 1)

main = mapM_ (putStr . strip) (tail $ numbers 6)

score 0 · Answer 5 · answered Nov 27 '10 at 22:59

This version does a bit better then yours. I guess it's one way to improve on it.

showWithSpaces        :: (Show a) => [a] -> ShowS
showWithSpaces []     = showString ""
showWithSpaces (x:xs) = shows x . showChar ' ' . showWithSpaces xs

main = putStrLn $ showWithSpaces [1..2000000] $ ""

Efficient output of numbers

5 Answers5