8

For example, I am writing some function for lists and I want to use length function

foo :: [a] -> Bool
foo xs = length xs == 100

How can someone understand could this function be used with infinite lists or not?

Or should I always think about infinite lists and use something like this

foo :: [a] -> Bool
foo xs = length (take 101 xs) == 100

instead of using length directly?

What if haskell would have FiniteList type, so length and foo would be

length :: FiniteList a -> Int
foo :: FiniteList a -> Bool
ais
  • 2,514
  • 2
  • 17
  • 24

5 Answers5

9

length traverses the entire list, but to determine if a list has a particular length n you only need to look at the first n elements.

Your idea of using take will work. Alternatively you can write a lengthIs function like this:

-- assume n >= 0
lengthIs 0 [] = True
lengthIs 0 _  = False
lengthIs n [] = False
lengthIs n (x:xs) = lengthIs (n-1) xs

You can use the same idea to write the lengthIsAtLeast and lengthIsAtMost variants.

ErikR
  • 51,541
  • 9
  • 73
  • 124
  • It just feels weird. Why does length function even exist if you cannot just use it? – ais Oct 08 '15 at 13:17
  • 3
    Well - what would you want `length` of an infinite list to return? – ErikR Oct 08 '15 at 13:21
  • I would prefer if length do not accept infinite lists and have type something like ```FiniteList a -> Int``` – ais Oct 08 '15 at 13:23
  • 1
    ais: you may be interested by [this discussion of data and codata](http://blog.sigfpe.com/2007/07/data-and-codata.html). Using this terminology `length` should only take data, but lists are potentially codata. – rampion Oct 08 '15 at 13:31
  • 2
    You can define a `FiniteList` type in Haskell - for instance `Vector a` might work well for your use cases. Indeed, the lists you see in languages like Python, Perl, Ruby, etc. are really best modeled by `Vector a` in Haskell. Lists would could be infinite are very useful in Haskell, and as you learn more about Haskell you'll understand when lists are appropriate and when you should use a different data structure. – ErikR Oct 08 '15 at 13:33
  • "what would you want length of an infinite list to return?" — an infinite number. – effectfully Oct 08 '15 at 19:05
  • 1
    @user3237465, that's the effect of the "lazy `Nat`" approach. But `length` produces `Int`, and there's no such thing as an infinite `Int`. There's also no such thing as an infinite natural number, but laziness effectively adjoins an "infinity", `S $ S $ S $ ...` to the naturals. – dfeuer Oct 08 '15 at 19:54
  • 1
    @dfeuer, I meant "length" — a more general thing than `length`. There is an infinite conatural number, and in Haskell `Nat` ≡ `Conat`. I don't propose to use `Nat`s everywhere, but an infinite list that have an infinite length is totally sensible construction to me. – effectfully Oct 08 '15 at 20:03
  • 1
    @user3237465, well, you could make inductive naturals if you wanted: `data Nat' = Z' | S' !Nat'`. – dfeuer Oct 08 '15 at 20:06
5

On edit: I am primaily responding to the question in your title rather than the specifics of your particular example, (for which ErikR's answer is excellent).

A great many functions (such as length itself) on lists only make sense for finite lists. If the function that you are writing only makes sense for finite lists, make that clear in the documentation (if it isn't obvious). There isn't any way to enforce the restriction since the Halting problem is unsolvable. There simply is no algorithm to determine ahead of time whether or not the comprehension

takeWhile f [1..]

(where f is a predicate on integers) produces a finite or an infinite list.

John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • 2
    Good point! But there is a way to separate "definitely finite" and "possibly infinite" lists. – rampion Oct 08 '15 at 13:34
4

Nats and laziness strike again:

import Data.List

data Nat = S Nat | Z deriving (Eq)

instance Num Nat where
    fromInteger 0 = Z
    fromInteger n = S (fromInteger (n - 1))

    Z   + m = m
    S n + m = S (n + m)

lazyLength :: [a] -> Nat
lazyLength = genericLength

main = do
    print $ lazyLength [1..]    == 100 -- False
    print $ lazyLength [1..100] == 100 -- True
effectfully
  • 12,325
  • 2
  • 17
  • 40
  • how efficient is this? ;) – Michael Oct 08 '15 at 18:25
  • 1
    @Michael, slower by a small constant factor than the direct implementation, I guess. But you don't need to write `lengthIsAtLeast`, `lengthIsAtMost` and other fancy functions — a `Nat` module would give them for free. But, AFAIK, there is no such module, so it's probably better to just use `lengthIs`. – effectfully Oct 08 '15 at 19:01
  • 2
    well, using 5000000 instead of 100, I have 1.291s for the Nat-implementation and 0.185s for a naive `isLength` implementation, on my computer. Both are compiled with `ghc -O2`. That's a factor 6.9 ... the factor stays the same if I increase the max value to 10000000. – Michael Oct 08 '15 at 19:32
  • @Michael, defining `lazyLength` directly as `lazyLength = foldr (\_ -> S) Z` gives me `4.96s` for `lengthIs 100000000` and `5.74` for `lazyLength [1..100000000] == 100000000`. – effectfully Oct 08 '15 at 20:10
  • is `genericLength` implemented strictly or smth? or why is it slower than the equivalent `foldr` implementation? – Erik Kaplun Oct 09 '15 at 10:13
  • It makes sense to me that the `genericLength` implementation is much slower, since it involves conversion between numerals and `Nat`s. Furthermore, the evaluation of `1 + x` reduces to `S (Z + x)`. Eliminating the `Z` in this expression is one extra step. – is7s Oct 09 '15 at 14:16
2

ErikR and John Coleman have already answered the main parts of your question, however I'd like to point out something in addition:

It's best to write your functions in a way that they simply don't depend on the finiteness or infinity of their inputs — sometimes it's impossible but a lot of the time it's just a matter of redesign. For example instead of computing the average of the entire list, you can compute a running average, which is itself a list; and this list will itself be infinite if the input list is infinite, and finite otherwise.

avg :: [Double] -> [Double]
avg = drop 1 . scanl f 0.0 . zip [0..]
  where f avg (n, i) = avg * (dbl n / dbl n') +
                       i            / dbl n'      where n'  = n+1
                                                        dbl = fromInteger

in which case you could average an infinite list, not having to take its length:

*Main> take 10 $ avg [1..]
[1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0]

In other words, one option is to design as much of your functions to simply not care about the infinity aspect, and delay the (full) evaluation of lists, and other (potentially infinite) data structures, to as late a phase in your program as possible.

This way they will also be more reusable and composable — anything with fewer or more general assumptions about its inputs tends to be more composable; conversely, anything with more or more specific assumptions tends to be less composable and therefore less reusable.

Erik Kaplun
  • 37,128
  • 15
  • 99
  • 111
  • If your functions don't depend of finiteness, then function like length is useless. You can use them but then you realize you need to rewrite all your code. – ais Oct 08 '15 at 13:21
  • What I was saying is that you can _redesign_ your code in a way that it doesn't depend on `length` — of course, that's to an extent, but if you can do that, it'll make your code more flexible and functions more agnostic of some aspects of their input. – Erik Kaplun Oct 08 '15 at 13:27
  • My code is simple, I want to know if length of list is 100. So I have written ```length xs == 100``` and then I realize I cannot do this because of infinite lists. – ais Oct 08 '15 at 13:31
  • 1
    My advice was general. Maybe your current project is just a toy for learning, and `length xs == 100` is your ultimate goal, but this might not be the case in the future. – Erik Kaplun Oct 08 '15 at 13:45
1

There are a couple different ways to make a finite list type. The first is simply to make lists strict in their spines:

data FList a = Nil | Cons a !(FList a)

Unfortunately, this throws away all efficiency benefits of laziness. Some of these can be recovered by using length-indexed lists instead:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# OPTIONS_GHC -fwarn-incomplete-patterns #-}

data Nat = Z | S Nat deriving (Show, Read, Eq, Ord)

data Vec :: Nat -> * -> * where
  Nil :: Vec 'Z a
  Cons :: a -> Vec n a -> Vec ('S n) a

instance Functor (Vec n) where
  fmap _f Nil = Nil
  fmap f (Cons x xs) = Cons (f x) (fmap f xs)

data FList :: * -> * where
  FList :: Vec n a -> FList a

instance Functor FList where
  fmap f (FList xs) = FList (fmap f xs)

fcons :: a -> FList a -> FList a
fcons x (FList xs) = FList (Cons x xs)

funcons :: FList a -> Maybe (a, FList a)
funcons (FList Nil) = Nothing
funcons (FList (Cons x xs)) = Just (x, FList xs)

-- Foldable and Traversable instances are straightforward
-- as well, and in recent GHC versions, Foldable brings
-- along a definition of length.

GHC does not allow infinite types, so there's no way to build an infinite Vec and thus no way to build an infinite FList (1). However, an FList can be transformed and consumed somewhat lazily, with the cache and garbage collection benefits that entails.

(1) Note that the type system forces fcons to be strict in its FList argument, so any attempt to tie a knot with FList will bottom out.

dfeuer
  • 48,079
  • 5
  • 63
  • 167