in haskell, how to represent infinite data which is newly defined

Question

I defined a new data type in Haskell in the following way:

data Pro = P Int Pro | Idle
              deriving Show

Then I defined a operator which works for this new data type:

(>*>) :: Pro -> Pro -> Pro
Idle        >*>   ps  = ps
P i ps      >*>   qs  = P i (ps >*> qs)

So, infi r = P r Idle >*> (infi (r+1)) could represent the infinite data and if I type the infi4 1 in the terminal, it will print infinitely.

Later, I realized that this new data type definition could be modified to:

data Try = T [Int] 
     deriving Show

They are quite similar and the later one is kind of list, which seems easier. But when I defined the following infinite data:

(>/>) :: Try -> Try -> Try
T []    >/>     i       = i 
T ts    >/>     T qs    = T (ts ++ qs )

infi2 r = (T [r]) >/> (infi2 r)

and tried to print it in the terminal, it showed that :

Exception: stack overflow

It seems that the lazy property of Haskell will not works on this new data type. Dose anyone could tell me the reason and how to print the infinite data by the second data type.

kosmikus · Answer 1 · 2015-02-06T19:12:44.337

The other answer is correct in saying that one proper fix is to use a ~-pattern (also called irrefutable pattern or lazy pattern) or a newtype, but let's look at the why ...

Here are your definitions again for reference:

(>/>) :: Try -> Try -> Try
T []    >/>     i       = i 
T ts    >/>     T qs    = T (ts ++ qs)

infi2 r = T [r] >/> infi2 r -- I've omitted unnecessary parentheses

Now, if you call infi2 1, the following reduction happens:

   infi2 1

=    { expanding the definition of infi2 }

   T [1] >/> infi2 1

Now this is the important point. We want to reduce the outermost function, which is >/>. But we have to decide which of the cases applies. It's easy to see that the first one does not, because T [1] does not match T []. The second case, however, requires the second argument of >/> to be of shape T qs, and we have infi2 1. Even though Try has only one constructor, GHC/Haskell will not make such a leap of faith. Instead, it will evaluate infi2 1 further until it learns its outermost constructor. So the next reduction step is

   T [1] >/> infi2 1

=    { expanding the definition of infi2 }

   T [1] >/> (T [1] >/> infi2 1)

Now we're in exactly the same situation again. We can still not reduce the outermost >/> because we don't know the constructor of the right argument; so we have to reduce that further. But there, again, we need to reduce the right argument further to learn about the constructor of the right argument of the inner >/>:

   T [1] >/> (T [1] >/> infi2 1)

=    { expanding the definition of infi2 }

   T [1] >/> (T [1] >/> infi2 1)

=    { expanding the definition of infi2 }

   T [1] >/> (T [1] >/> (T [1] >/> infi2 1))

=    ...

This will continue indefinitely until memory fills up. We can never make any real progress.

Looking back at the original definitions once more, it's (with a bit of practice) actually possible to see this without doing the entire expansion:

(>/>) :: Try -> Try -> Try
T []    >/>     i       = i 
T ts    >/>     T qs    = T (ts ++ qs)

infi2 r = T [r] >/> infi2 r

In the second case of the definition of >/>, we produce a T only after we know that both arguments are Ts. So in infi2 r, we can only reduce the outer >/> after infi2 r returns, but that's a recursive call ...

Now about the solutions that address this:

Use an newtype

With

newtype Try = T [Int]
  deriving (Show)

instead of data, pattern matching on T becomes a no-op. A newtype is guaranteed to have the same runtime representation as the underlying type (here [Int]), and applying the constructor T or pattern matching has an effect on converting the types, but no effect at runtime.

Therefore, once we have

T [1] >/> infi2 1

in order to make a decision for one of the cases, we now only see that the first list is nonempty, so the first case cannot apply. The second case has left hand side

T ts >/> T qs = ...

which under the assumption that pattern matching on T is a noop is trivially true and can immediately be reduced.

Using a `~`-pattern

Similarly, if we keep using data, but write

T ts >/> ~(T qs) = ...

we change the behaviour of GHC/Haskell to actually make the "leap of faith" I talked about above. An irrefutable pattern match succeeds automatically, so it never causes further evaluation. In the case of single-constructor datatypes such as Try, this is essentially safe to do. However, if you do such a lazy pattern match on a multi-constructor datatype and it turns out that the value you're matching against is not of the constructor appearing in your pattern, the match will still succeed, and you'll get a runtime exception once you try to use values from inside the pattern.

Explicitly extracting

A third option is to write an extraction function

unT :: Try -> [Int]
unT (T ts) = ts

and then say

(>/>) :: Try -> Try -> Try
T []    >/>     i       = i 
T ts    >/>     qs      = T (ts ++ unT qs)

This makes it obvious that we don't expect anything of the second argument at the time of the pattern match. This version very much corresponds to what the ~-pattern-version will compile to.

To conclude, let's look at the reduction now:

   infi2 1

=    { expanding the definition of infi2 }

   T [1] >/> infi2 1

=    { expanding the definition of >/> }

   T ([1] ++ unT (infi2 1))

Assuming we want to print the result and a full reduction, let's continue from here for a bit:

   T ([1] ++ unT (infi2 1))

=    { expanding the definition of ++ }

   T (1 : unT (infi2 1))

=    { expanding the definition of infi2 }

   T (1 : unT (T [1] >/> infi2 1))

=    { expanding the definition of >/> }

   T (1 : unT (T ([1] ++ unT (infi2 1))))

=    { expanding the definition of the outer unT }

   T (1 : ([1] ++ unT (infi2 1)))

At this point, it should be obvious that we indeed get the infinite list incrementally.

effectfully · Accepted Answer · 2015-02-04T20:33:02.397

3

You need tilda:

(>/>) :: Try -> Try -> Try
T []    >/>     i       = i 
T ts    >/>     ~(T qs)    = T (ts ++ qs )

Also you don't need the first clause, so (>/>) can be defined as

(>/>) :: Try -> Try -> Try
~(T ts) >/> ~(T qs) = T (ts ++ qs)

The definition of (>*>) is lazy, because there is no pattern-matching on the second argument.

UPDATE

As suggested by @MigMit you can just use newtype, and your original definition of (>/>) will work. Have a look at the 2 The messy bits section of https://wiki.haskell.org/Newtype

edited Feb 04 '15 at 20:33

answered Feb 04 '15 at 19:25

effectfully

12,325
2
17
40

1

I would suggest using `newtype` instead of `data`. – MigMit Feb 04 '15 at 20:20

in haskell, how to represent infinite data which is newly defined

2 Answers2

Use an newtype

Using a ~-pattern

Explicitly extracting

Using a `~`-pattern