1

I am implementing some algorithm on haskell. This algorithm requires generating some data.

I have a function of an algorithm which takes generation function as a parameter. For example, algorithm is just multiplying input data by n:

 algo :: a -> ??? -> [a]
 algo n dgf = map (\x -> x * n) $ dgf

dgf is used to generate data. How to write function header correctly, as dgf can be any function with any number of parameters?

Another variant is accepting not the generation function but already generated data.

algo :: a -> [b] -> [a]
algo n d = (\x -> n*x) d

So, now let's imagine I'm generation data with stdGen, which uses IO. How can I make function more generic, so that it could accept both IO instance and plain values like just [1,2,3]. This also relates to variant with function, as it can also produce IO.

All in all, which solution is better - having a generation function or a pre-generated data?

Thanks in advance.

Rahul
  • 1,727
  • 3
  • 18
  • 33
  • What prevents you from using `algo` on `IO [a]`, e.g. `algo n <$> generateDgf`? – Zeta Sep 06 '17 at 16:17
  • Writing the signature as `algo :: _` will get GHC to tell you the type it infers for your function. – Li-yao Xia Sep 06 '17 at 16:48
  • 1
    `StdGen` is, in fact, pure, as are most of the functions in [System.Random](https://hackage.haskell.org/package/random-1.1/docs/System-Random.html). It's true that `getStdGen` is impure, but most of the random functions just use the `RandomGen` typeclass, which is pure. Since you can call pure functions from impure code, you just need to kick off your algorithm from impure code, like @Zeta implies. Since the entry point (`main`) is always impure, you can call `getStdGen` from there, and then pass the `StdGen` value to your pure function(s). – Mark Seemann Sep 06 '17 at 17:17

2 Answers2

6

One option is to take a stream rather than a list. If generating the values involves performing IO, and there may be many many values, this is often the best approach. There are several packages that offer streams of some sort, but I'll use the streaming package in this example.

import qualified Streaming.Prelude as S
import Streaming

algo :: Monad m => a -> Stream (Of a) m r -> Stream (Of a) m r
algo a = S.map (a +)

You can read Stream (Of a) m r as "a way to use operations in m to produce successive values of type a and finally a result of type r". This algo function doesn't commit to any particular way of generating the data; they can be created purely:

algo a (S.each [these, are, my, elements])

or within IO,

algo a $ S.takeWhile (> 3) (S.readLn :: Stream (Of Int) IO ())

or using a randomness monad, or whatever you like.

dfeuer
  • 48,079
  • 5
  • 63
  • 167
2

For contrast, I'm going to take the opposite approach as dfeuer's answer.

Just use lists.

Consider your first example:

algo :: a -> ??? -> [a]
algo n dgf = map (\x -> x * n) $ dgf

You ask "How to write function header correctly, as dgf can be any function with any number of parameters?"

Well, one way is to use uncurrying.

Normally, Haskell functions are curried. If we have a function like

add :: Int -> Int -> Int
add x y = x + y

And we want a function that adds two to its input we can just use add 2.

>>> map (add 2) [1..10]
[3,4,5,6,7,8,9,10,11,12]

Because add is not actually a function that takes two arguments, it's a function of one argument that returns a function of one argument.

We could have added parentheses to the argument of add above to make this more clear:

add :: Int -> (Int -> Int)

In Haskell, all functions are functions of one argument.

However, we can also go the other way - uncurry a function that returns a function to get a function that takes a pair:

>>> :t uncurry
uncurry :: (a -> b -> c) -> (a, b) -> c
>>> :t uncurry add
uncurry add :: (Int, Int) -> Int

This can also be useful, say if we want to find the sum of each pair in a list:

>>> map (uncurry add) [ (1,2), (3,4), (5,6), (7,8), (9,10) ]
[3,7,11,15,19]

In general, we can uncurry any function of type a0-> a1 -> ... -> aN -> b into a function (a0, a1, ..., aN) -> b, though there might not be a cute library function to do it for us.

With that in mind, we could implement algo by passing it an uncurried function and a tuple of values:

algo :: Num a => a -> (t -> [a]) -> t -> [a]
algo n f t = map (\x -> x * n) $ f t

And then use anonymous functions to uncurry our argument functions:

>>> algo 2 (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo 3 (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]

Now we could do it this way, but we don't need to. As implemented above, algo is only using f and t once. So why not pass it the list directly?

algo' :: Num a => a -> [a] -> [a]
algo' n ns = map (\x -> x * n) ns

It calculates the same results:

>>> algo' 2 $ (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo' 2 $ enumFromTo 5 10
[10,12,14,16,18,20]
>>> algo' 3 $ (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]
>>> algo' 3 $ zipWith (+) [1..5] [10..14]
[33,39,45,51,57]

Furthermore, since haskell is non-strict, the argument to algo' isn't evaluated until it's actually used, so we don't have to worry about "wasting" time computing arguments that won't actually be used:

algo'' :: Num a => a -> [a] -> [a]
algo'' n ns = [n,n,n,n]

algo'' doesn't use the list passed to it, so it's never forced, so whatever computation is used to calculate it never runs:

>>> let isPrime n = n > 2 && null [ i | i <- [2..n-1], n `rem` i == 0 ]
>>> :set +s
>>> isPrime 10000019
True
(6.18 secs, 2,000,067,648 bytes)
>>> algo'' 5 (filter isPrime [1..999999999999999])
[5,5,5,5]
(0.01 secs, 68,936 bytes)

Now to the second part of your question - what if your data is being generated within some monad?

Rather than convince algo to operate on monadic values, you could take the stream based approach as dfeuer explains. Or you could just use a list.

Just because you're in a monad, doesn't mean that your values suddenly become strict.

For example, want a infinite list of random numbers? No problem.

newRandoms :: Num a -> IO [a]
newRandoms = unfoldr (\g -> Just (random g)) <$> newStdGen

Now I can just pass those to some algorithm:

>>> rints <- newRandoms :: IO [Int]
(0.00 secs, 60,624 bytes)
>>> algo'' 5 rints
[5,5,5,5]
(0.00 secs, 68,920 bytes)

For a small program which is just reading input from a file or two, there's no problem with just using readFile and lazy I/O to get a list to operate on.

For example

>>> let grep pat lines = [ line | line <- lines, pat `isInfixOf` line ]
>>> :set +s
>>> dict <- lines <$> readFile "/usr/share/dict/words"
(0.01 secs, 81,504 bytes)
>>> grep "poop" dict
["apoop","epoophoron","nincompoop","nincompoopery","nincompoophood","nincompoopish","poop","pooped","poophyte","poophytic","whisterpoop"]
(0.72 secs, 423,650,152 bytes)
rampion
  • 87,131
  • 49
  • 199
  • 315
  • 1
    Here's what I don't like about the infinite list of random numbers: it forces me to commit to a pure RNG. If I later (hypothetically) decide to do some sort of Fancy Science with a [hardware random number generator](https://en.wikipedia.org/wiki/Hardware_random_number_generator), I have to rewrite my code, or play games with lazy `IO`. Using a stream, I can put that decision off as long as I want. I can develop and test my code using a pure generator and then just swap it out for a real one once the grant is approved or whatever. – dfeuer Sep 07 '17 at 02:28
  • 1
    One other thing: with a *list* of random numbers, it's exceedingly easy to accidentally use the same one twice. That's possible with a stream too, but I think it's a bit harder; the stream APIs don't generally encourage that sort of thing. – dfeuer Sep 07 '17 at 02:31