18

I've got a function that takes data and either returns the same data or a slightly modified version.

I want to have my program do one thing if it changed or another thing if it did not change.

Previously I was returning a pair (Bool,Object) and using fst to check if it changed. Lately it occurred to me that I could simplify the code by just returning the object and checking equality using ==.

But then I realized that Haskell doesn't differentiate between deep equality checking and "object identity" (i.e., pointer equality). So how can I know whether using == is going to be efficient or not? Should I avoid it for efficiency reasons, or are there cases where I can depend on the compiler figuring out that it doesn't need to do a deep equality check?

Normally I wouldn't be too worried about efficiency while writing an initial program, but this affects the interface to my module so I want to get it right before writing too much code, and it doesn't seem worth it to make the program much less efficient just to simply a small piece of code. Moreover, I'd like to get a better idea of what kind of optimizations I can depend on GHC to help me with.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
Steve
  • 8,153
  • 9
  • 44
  • 91
  • 12
    Just as a note, this seems like a better case for either an `Either` value or something equivalent but more semantic (e.g. `data ChangeState a = Same a | Changed a`). – Chuck Dec 29 '09 at 18:58

3 Answers3

34

It's always a bad idea to rely on uncertain compiler optimizations to provide such an important performance guarantee as constant-time equality vs linear-time deep equality. You're much better off with a new type that encapsulates a value plus information about whether the value is new. Depending on your application this can be either

data Changed a = Changed a | Unchanged a

or

data Changed a = Changed a | Unchanged

We actually use a similar type inside the Glasgow Haskell Compiler so we can keep running the optimizer until the code stops changing. We also run iterative dataflow analysis until the results stop changing.

We found it useful to make this type a monad so that we can write some simple higher-order functions using do notation, but it's not necessary—just a convenience.

Summary: If you want constant-time checking, code it yourself—don't rely on a possible compiler optimization which might not be there—or which might change in the next release.

Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533
  • Thanks, that feels like a pretty concrete answer. – Steve Dec 30 '09 at 07:23
  • 3
    I wonder how this works as a monad? The second version seems like the `Maybe` monad. However unlike `Maybe`, here you would like the last unchanged result, and not just knowing that it was later `Unchanged`. – yairchu Jan 05 '10 at 15:57
  • @yairchu It's still vaguely similar to the Maybe monad. If there's a "Changed" anywhere along the monadic composition then the result is "Changed something", otherwise it's "Unchanged something" – Jeremy List Nov 19 '13 at 06:07
4

The derived (==) is always deep comparison. Your question has been discussed on haskell-cafe.

Wei Hu
  • 2,888
  • 2
  • 27
  • 28
1

I'm still a relative haskell noob, so take my answer with a gran of salt, and please forgive me if my answer isn't as direct as it should be!

In Haskell, operators aren't special - they're just infix functions.

You can look at the definition of the equality operator yourself in the standard prelude.

Of course, it can be overloaded to work with whatever data type you've defined - but if you do the overloading, you'll know how efficient the implementation is.

It might be helpful to know that you can use Hoogle to find the function definition you want. That's how I found the definition of the equality operator.

nont
  • 9,322
  • 7
  • 62
  • 82
  • worth looking at this post too: http://stackoverflow.com/questions/1717553/pointer-equality-in-haskell – nont Dec 29 '09 at 21:24