24

I'm just starting to learn Haskell and keep seeing references to its powerful type system. I see many instances in which the inference is much more powerful than Javas, but also the implication that it can catch more errors at compile time because of its superior type system. So, I'm wondering if it would be possible to explain what types of errors Haskell can catch at compile time that Java cannot.

pondermatic
  • 6,453
  • 10
  • 48
  • 63
  • 1
    I believe part of the difference is from the fact that Haskell is strictly typed, but in many cases you can delete all type signatures and it'll still work. You can't do that in Java. Not every variable has to have an explicit type signature, and it isn't an `object` by default, the type system can figure it out. For example, in the function `f :: Int -> Int`; `f x = sum $ map (\y -> y * x) [1..x]`, there is no type attached to `y`, and yet GHC can infer that it is an `Int`. This is obviously a simple case, but it applies with much more complex types as well. – bheklilr Aug 19 '14 at 02:18
  • Haskell has no direct equivalent to NullPointerException. The closest would be a pattern match failure or a `Exception: Prelude.(!!): index too large` – Jeremy List Aug 19 '14 at 10:50
  • see my answer to this question, it contains an example of using type-level programming to statically guarantee invariants of a RedBlack tree. http://stackoverflow.com/questions/24481113/what-are-some-examples-of-type-level-programming/24481742#24481742 – cdk Aug 19 '14 at 13:31

4 Answers4

26

Saying that Haskell's type system can catch more errors than Java's is a little bit misleading. Let's unpack this a little bit.

Statically Typed

Java and Haskell are both statically typed languages. By this I mean that they type of a given expression in the language is known at compile time. This has a number of advantages, for both Java and Haskell, namely it allows the compiler to check that the expressions are "sane", for some reasonable definition of sane.

Yes, Java allows certain "mixed type" expressions, like "abc" + 2, which some may argue is unsafe or bad, but that is a subjective choice. In the end it is just a feature that the Java language offers, for better or worse.

Immutability

To see how Haskell code could be argued to be less error prone than Java (or C, C++, etc.) code, you must consider the type system with respect to the immutability of the language. In pure (normal) Haskell code, there are no side effects. That is to say, no value in the program, once created, may ever change. When we compute something we are creating a new result from the old result, but we don't modify the old value. This, as it turns out, has some really convenient consequences from a safety perspective. When we write code, we can be sure nothing else anywhere in the program is going to effect our function. Side effects, as it turns out, are the cause of many programming errors. An example would be a shared pointer in C that is freed in one function and then accessed in another, causing a crash. Or a variable that is set to null in Java,

String foo = "bar";
foo = null;
Char c = foo.charAt(0); # Error!

This could not happen in normal Haskell code, because foo once defined, can not change. Which means it can not be set to null.

Enter the Type System

Now, you are probably wondering how the type system plays into all of this, that is what you asked about after all. Well, as nice as immutability is, it turns out there is very little interesting work that you can do without any mutation. Reading from a file? Mutation. Writing to disk? Mutation. Talking to a web server? Mutation. So what do we do? In order to solve this issue, Haskell uses its type system to encapsulate mutation in a type, called the IO Monad. For instance to read from a file, this function may be used,

readFile :: FilePath -> IO String

The IO Monad

Notice that the type of the result is not a String, it is an IO String. What this means, in laymans terms, is that the result introduces IO (side effects) to the program. In a well formed program IO will only take place inside the IO monad, thus allowing us to see very clearly, where side effects can occur. This property is enforced by the type system. Further IO a types can only produce their results, which are side effects, inside the main function of the program. So now we have very neatly and nicely isolated off the dangerous side effects to a controlled part of the program. When you get the result of the IO String, anything could happen, but at least this can't happen anywhere, only in the main function and only as the result of IO a types.

Now to be clear, you can create IO a values anywhere in your code. You can even manipulate them outside the main function, but none of that manipulation will actually take place until the result is demanded in the body of the main function. For instance,

strReplicate :: IO String
strReplicate =
  readFile "somefile that doesn't exist" >>= return . concat . replicate 2

This function reads input from a file, duplicates that input and appends the duplicated input onto the end of the original input. So if the file had the characters abc this would create a String with the contents abcabc. You can call this function anywhere in your code, but Haskell will only actually try to read the file when expression is found in the main function, because it is an instance of the IO Monad. Like so,

main :: IO ()
main =
  strReplicate >>=
  putStrLn

This will almost surely fail, as the file you requested probably doesn't exist, but it will only fail here. You only have to worry about side effects, not everywhere in your code, as you do in many other languages.

There is a lot more to both IO and Monads in general than I have covered here, but that is probably beyond the scope of your question.

Type Inference

Now there is one more aspect to this. Type Inference

Haskell uses a very advanced Type Inference System, that allows for you to write code that is statically typed without having to write the type annotation, such as String foo in Java. GHC can infer the type of almost any expression, even very complex ones.

What this means for our safety discussion is that everywhere an instance of IO a is used in the program, the type system will make sure that it can't be used to produce an unexpected side effect. You can't cast it to a String, and just get the result out where/when ever you want. You must explicitly introduce the side effect in the main function.

The Safety of Static Typing with the Ease of Dynamic Typing

The Type inference system has some other nice properties as well. Often people enjoy scripting languages because they don't have to write all that boilerplate for the types like they would have to do in Java or C. This is because scripting languages are dynamically typed or the type of the expression is only computed as the expression is being run by the interpreter. This makes these languages arguably more prone to errors, because you won't know if you have a bad expression until you run the code. For example, you might say something like this in Python.

def foo(x,y):
  return x + y

The problem with this is that x and y can be anything. So this would be fine,

foo(1,2) -> 3

But this would cause an error,

foo(1,[]) -> Error

And we have now way of checking that this is invalid, until it is run.

It is very important to understand that all statically type languages do not have this problem, Java included. Haskell is not safer than Java in this sense. Haskell and Java both keep you safe from this type of error, but in Haskell you don't have to write all the types in order to be safe, they type system can infer the types. In general, it is considered good practice to annotate the types for your functions in Haskell, even though you don't have to. In the body of the function however, you rarely have to specify types (there are some strange edge cases where you will).

Conclusion

Hopefully that helps illuminate how Haskell keeps you safe. And in regard to Java, you might say that in Java you have to work against the type system to write code, but in Haskell the type system works for you.

Community
  • 1
  • 1
isomarcte
  • 1,981
  • 12
  • 16
  • `Yes, Java allows certain "mixed type" expressions, like "abc" + 2` well, Haskell does too: `instance Num Char where {fromInteger _ = 'a'; _ + _ = 'b'}; instance Num a => Num [a] where {fromInteger n = [fromInteger n]; (+) = (++)}; "abc" + 2 -> "abca"`. However this is done overloading *both* the `+` operator *and* the `2` literal, why Java requires only overloading the `+`. – Bakuriu Aug 19 '14 at 09:34
  • How are UIs written if IO is only allowed in main? – Ian Ringrose Aug 19 '14 at 10:29
  • @IanRingrose In Haskell IO is allowed in any function with an `IO something` in its type. IO is only _excecuted_ in `main`. This makes IO programs first class values, so you can throw them around and combine them as easily as combining strings. – AndrewC Aug 19 '14 at 12:22
  • -1. This answer cites immutability as a strength of Haskell (it certainly is) but doesn't give a convincing argument besides a contrived NPE example. Then claims one "must explicitly introduce side effects in 'main'" which is misleading. Of course `IO` values can be defined anywhere, not just `main`. The conclusion "Haskell is _not_ safer than Java in this sense" is confusing. In the sense that both are statically typed, I suppose they are equal. But Haskell's type system can catch classes of bugs that Java cannot (which was the point of the OP's question) so they are not equal in this regard. – cdk Aug 19 '14 at 14:01
  • 1
    @cdk `Then claims one "must explicitly introduce side effects in 'main'" which is misleading.` How is it misleading? You can create `IO` values anywhere but `main` is the only place where they *execute* and have *effects*. – Doval Aug 19 '14 at 14:10
  • 2
    This is perhaps one fairly medium-sized chunk of technology that Haskell embeds in its type system for safety—effect typing. I voted to close because there are many other such things, but it's difficult to concisely list them all. So—to be clear—this answer does a good job highlighting *a* place where Haskell's choice of type system and use of it in standard functionality introduces added safety, but by no means all or even the most important chunk. – J. Abrahamson Aug 19 '14 at 17:49
  • @Doval: That is correct, of course. But the wording of the answer, which I quoted, does not make that clear. See the confusion it caused Ian Ringrose. A better worded explanation might be: "`IO` code can be defined anywhere in Haskell, but it must be explicitly labeled as such via the type system. This ensures that side-effecting and pure code cannot mix. All `IO` code must be called either directly or indirectly from `main`" – cdk Aug 19 '14 at 18:12
  • @J.Abrahamson What would you argue is "the most important chunk" of type system technology that makes Haskell a "safer" language than Java? – isomarcte Aug 19 '14 at 18:20
  • @cdk Thank you for the feedback. I have updated my answer in attempt to make it more clear that you can create `IO` values anywhere, but their effects only occur in the `main` function. If you have more suggestions for clarity, please let me know. – isomarcte Aug 19 '14 at 18:31
  • 2
    @isomarcte Oh, to be clear, I think effect typing is *very important*. I think adding parametricity, algebraic data types, and quantification is fairly vital as well. Probably bounded polymorphism just to give it that Haskell flavor, too. From there you could talk about typeclass Prolog if you like. Ultimately, I didn't mean to suggest one of these is better than the rest, but instead that the richness of Haskell's type system is quite a bit larger than effect types alone. – J. Abrahamson Aug 19 '14 at 19:25
  • I'd probably throw in typesafe cast restrictions and restrictions on subtyping, too, but those are more techincal. – J. Abrahamson Aug 19 '14 at 19:27
5

Type Casts

One difference is that Java allows dynamic type casts such as (silly example follows):

class A { ... }
static String needsA(A a) { ... }

Object o = new A();
needsA((A) o);

Type casts can lead to runtime type errors, which can be regarded as a cause of type unsafety. Of course, any good Java programmer would regard casts as a last resort, and rely on the type system to ensure type safety instead.

In Haskell, there is (roughly) no subtyping, hence no type casts. The closest feature to casts is the (unfrequently used) Data.Typeable library, as shown below

foo :: Typeable t => t -> String
foo x = case cast x :: A of          -- here x is of type t
        Just y  -> needsA y          -- here y is of type A
        Nothing -> "x was not an A"

which roughly corresponds to

String foo(Object x) {
   if (x instanceof A) {
      A y = (A) x;
      return needsA(y);
   } else {
      return "x was not an A";
   }
}

The main difference here between Haskell and Java is that in Java we have separate runtime type checking (instanceof) and cast ((A)). This might lead to runtime errors if checks do not ensure that casts will succeed.

I recall that casts were a big concern in Java before generics were introduced, since e.g. using collections forced you to perform a lot of casts. With generics the Java type system greatly improved, and casts should be far less common now in Java, since they are less frequently needed.

Casts and Generics

Recall that generic types are erased at run time in Java, hence code such as

if (x instanceof ArrayList<Integer>) {
  ArrayList<Integer> y = (ArrayList<Integer>) x;
}

does not work. The check can not be fully performed since we can not check the parameter of ArrayList. Also because of this erasure, if I remember correctly, the cast can succeed even if x is a different ArrayList<String>, only to cause runtime type errors later, even if casts do not appear in the code.

The Data.Typeable Haskell machinery does not erase types at runtime.

More Powerful Types

Haskell GADTs and (Coq, Agda, ...) dependent types extend conventional static type checking to enforce even stronger properties on the code at compile time.

Consider e.g. the zip Haskell function. Here's an example:

zip (+) [1,2,3] [10,20,30] = [1+10,2+20,3+30] = [11,22,33]

This applies (+) in a "pointwise" fashion on the two lists. Its definition is:

-- for the sake of illustration, let's use lists of integers here
zip :: (Int -> Int -> Int) -> [Int] -> [Int] -> [Int]
zip f []     _      = []
zip f _      []     = []
zip f (x:xs) (y:ys) = f x y : zip xs ys

What happens, however, if we pass lists of different lengths?

zip (+) [1,2,3] [10,20,30,40,50,60,70] = [1+10,2+20,3+30] = [11,22,33]

The longer one gets silently truncated. This may be an unexpected behaviour. One could redefine zip as:

zip :: (Int -> Int -> Int) -> [Int] -> [Int] -> [Int]
zip f []     []     = []
zip f (x:xs) (y:ys) = f x y : zip xs ys
zip f _      _      = error "zip: uneven lenghts"

but raising a runtime error is only marginally better. What we need is to enforce, at compile time, that the lists are of the same lengths.

data Z       -- zero
data S n     -- successor 
-- the type S (S (S Z)) is used to represent the number 3 at the type level    

-- List n a is a list of a having exactly length n
data List n a where
   Nil :: List Z a
   Cons :: a -> List n a -> List (S n) a

-- The two arguments of zip are required to have the same length n.
-- The result is long n elements as well.
zip' :: (Int -> Int -> Int) -> List n Int -> List n Int -> List n Int
zip' f Nil         Nil         = Nil
zip' f (Cons x xs) (Cons y ys) = Cons (f x y) (zip' f xs ys)

Note that the compiler is able to infer that xs and ys are of the same length, so the recursive call is statically well-typed.

In Java you could encode the list lengths in the type using the same trick:

class Z {}
class S<N> {}
class List<N,A> { ... }

static <A> List<Z,A> nil() {...}
static <A,N> List<S<N>,A> cons(A x, List<N,A> list) {...}

static <N,A> List<N,A> zip(List<N,A> list1, List<N,A> list2) {
   ...
}

but, as far as I can see, the zip code can not access the tails of the two lists and have them available as two variables of the same type List<M,A>, where M is intuitively N-1. Intuitively, accessing the two tails loses type information, in that we do no longer know they are of even length. To perform a recursive call, a cast would be needed.

Of course, one can rework the code differently and use a more conventional approach, e.g. using an iterator over list1. Admittedly, above I am just trying to convert a Haskell function in Java in a direct way, which is the wrong approach to coding Java (as much as would be coding Haskell by directly translating Java code). Still, I used this silly example to show how Haskell GADTs can express, without unsafe casts, some code which would require casts in Java.

chi
  • 111,837
  • 3
  • 133
  • 218
4

There are several things about Haskell that make it "safer" than Java. The type system is one of the obvious ones.

No type-casts. Java and similar OO languages let you cast one type of object to another. If you can't convince the type system to let you do whatever it is you're trying to do, you can always just cast everything to Object (although most programmers would immediately recognise this as pure evil). The trouble is, now you're in the realm of run-time type-checking, just like in a dynamically-typed language. Haskell doesn't let you do such things. (Unless you explicitly go out of your way to get it; and almost nobody does.)

Usable generics. Generics are available in Java, C#, Eiffel and a few other OO languages. But in Haskell they actually work. In Java and C#, trying to write generic code almost always leads to obscure compiler messages about "oh, you can't use it that way". In Haskell, generic code is easy. You can write it by accident! And it works exactly the way you'd expect.

Convenience. You can do things in Haskell that would be way too much effort in Java. For example, set up different types for raw user input verses sanitised user input. You can totally do that in Java. But you won't. It's too much boilerplate. You will only bother doing this if it's absolutely critical for your application. But in Haskell, it's only a handful of lines of code. It's easy. People do it for fun!

Magic. [I don't have a more technical term for this.] Sometimes, the type signature of a function lets you know what the function does. I don't mean you can figure out what the function does, I mean there is only one possible thing a function with that type could be doing or it wouldn't compile. That's an extremely powerful property. Haskell programmers sometimes say "when it compiles, it's usually bug-free", and that's probably a direct result of this.

While not strictly properties of the type system, I might also mention:

  • Explicit I/O. The type signature of a function tells you whether it performs any I/O or not. Functions that perform no I/O are thread-safe and extremely easy to test.

  • Explicit null. Data cannot be null unless the type signature says so. You must explicitly check for null when you come to use the data. If you "forget", the type signatures won't match.

  • Results rather than exceptions. Haskell programmers tend to write functions that return a "result" object which contains either the result data or an explanation of why no result could be produced. As opposed to throwing an exception and hoping somebody remembers to catch it. Like a nullable value, a result object is different from the actual result data, and the type system will remind you if you forget to check for failure.

Having said all of that, Java programs typically die with null pointer or array index exceptions; Haskell programs tend to die with exceptions like the infamous "head []".

MathematicalOrchid
  • 61,854
  • 19
  • 123
  • 220
0

For a very basic example, while this is allowable in Java:

public class HelloWorld {

    public static void main(String[] args) {
        int x = 4;
        String name = "four";

        String test = name + x;
        System.out.println(test);
    }

}

The same thing will produce a compile error in Haskell:

fourExample = "four" + 4

There is no implicit type casting in Haskell which helps in preventing silly errors like "four" + 4. You have to tell it explicitly, that you want to convert it to String:

fourExample = "four" ++ show 4
Sibi
  • 47,472
  • 16
  • 95
  • 163