Extent of GHC's optimization

Question

I am not very familiar with the degree that Haskell/GHC can optimize code. Below I have a pretty "brute-force" (in the declarative sense) implementation of the n queens problem. I know it can be written more efficiently, but thats not my question. Its that this got me thinking about the GHC optimizations capabilities and limits.

I have expressed it in what I consider a pretty straightforward declarative sense. Filter permutations of [1..n] that fulfill the predicate For all indices i,j s.t j<i, abs(vi - vj) != j-i I would hope this is the kind of thing that can be optimized, but it also kind of feels like asking a lot of compiler.

validQueens x = and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]] 

queens n = filter validQueens (permutations [1..n])

oneThru x = [1..x]    
pointlessQueens = filter validQueens . permutations . oneThru

main = do
          n <- getLine 
          print $ pointlessQueens $ (read :: String -> Int) n

This runs fairly slow and grows quickly. n=10 takes about a second and n=12 takes forever. Without optimization I can tell the growth is factorial (# of permutations) multiplied by quadratic (# of differences in the predicate to check). Is there any way this code can perform better thru intelligent compilation? I tried the basic ghc options such has -O2 and didn't notice a significant difference, but I don't know the finer points (just added the flagS)

My impression is that the function i call queens can not be optimized and must generate all permutations before filter. Does the point-free version have a better chance? On the one hand I feel like a smart function comprehension between a filter and a predicate might be able to knock off some obviously undesired elements before they are even fully generated, but on the other hand it kind of feels like a lot to ask.

Sorry if this seems rambling, i guess my question is

Is the pointfree version of above function more capable of being optimized?
What steps could I take at make/compile/link time to encourage optimization?
Can you briefly describe some possible (and contrast with the impossible!) means of optimization for the above code? At what point in the process do these occur?
Is there any particular part of ghc --make queensN -O2 -v output I should be paying attention to? Nothing stands out to me. Don't even see much difference in output due to optimization flags

I am not overly concerned with this code example, but I thought writing it got me thinking and it seems to me like a decent vehicle for discussing optimization.

PS - permutations is from Data.List and looks like this:

permutations            :: [a] -> [[a]]
permutations xs0        =  xs0 : perms xs0 []
  where
    perms []     _  = []
    perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is)
      where interleave    xs     r = let (_,zs) = interleave' id xs r in zs
            interleave' _ []     r = (ts, r)
            interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r
                                     in  (y:us, f (t:y:us) : zs)

Pointless or not shouldn't make any difference. In general, a bad algorithm is one of the few things compilers can't fix (exceptions for relatively simple things like turning recursive factorial into a loop for especially smart compilers). — , Jun 24 '11 at 16:01
You've picked a bad algorithm and ghc won't fix that for you. — augustss, Jun 24 '11 at 16:56
it may just be me, but I feel this question's premise may be a bit too broad to answer in any direct way: I'm currently interpreting the driving force behind the question as: **What kinds of general, cost-saving whole-program transformations can be made on pure, declarative functional code** which seems to be an entire research field in and of itself, and one that that relies a bit too much on the given problem domain at that. @delnan's point is incredibly poignant; even with modern *smart* compilers, computational complexity dominates. — Raeez, Jun 24 '11 at 16:57
I think that the premise of the question is good, but in my opinion it would be better to limit the questions to what optimizations does the compiler actually do and provide a code example with an algorithm that is acceptably efficient. Again, just my opinion. — HaskellElephant, Jun 24 '11 at 17:38
thanks for the answers everyone - i know the question was a little broad and there have still been some pretty informative answers/comments. i guess what i was really curious about was if anything clever could happen under the hood with the `(.)` function composition, and it appears the answer is no — jon_darkstar, Jun 24 '11 at 18:41

score 16 · Accepted Answer · edited May 23 '17 at 11:45

At a more general level regarding "what kind of optimizations can GHC do", it may help to break the idea of an "optimization" apart a little bit. There are conceptual distinctions that can be drawn between aspects of a program that can be optimized. For instance, consider:

The intrinsic logical structure of the algorithm: You can safely assume in almost every case that this will never be optimized. Outside of experimental research, you're not likely to find a compiler that will replace a bubble sort with a merge sort, or even an insertion sort, and extremely unlikely to find one that would replace a bogosort with something sensible.
Non-essential logical structure of the algorithm: For instance, in the expression g (f x) (f x), how many times will f x be computed? What about an expression like g (f x 2) (f x 5)? These aren't intrinsic to the algorithm, and different variations can be interchanged without impacting anything other than performance. The difficulties in performing optimization here are essentially recognizing when a substitution can in fact be done without changing the meaning, and predicting which version will have the best results. A lot of manual optimizations fall into this category, along with a great deal of GHC's cleverness.

This is also the part that trips a lot of people up, because they see how clever GHC is and expect it to do even more. And because of the reasonable expectation that GHC should never make things worse, it's not uncommon to have potential optimizations that seem obvious (and are, to the programmer) that GHC can't apply because it's nontrivial to distinguish cases where the same transformation would significantly degrade performance. This is, for instance, why memoization and common subexpression elimination aren't always automatic.

This is also the part where GHC has a huge advantage, because laziness and purity make a lot of things much easier, and is I suspect what leads to people making tongue-in-cheek remarks like "Optimizing compilers are a myth (except perhaps in Haskell).", but also to unrealistic optimism about what even GHC can do.
Low-level details: Things like memory layout and other aspects of the final code. These tend to be somewhat arcane and highly dependent on implementation details of the runtime, the OS, and the processor. Optimizations of this sort are essentially why we have compilers, and usually not something you need to worry about unless you're writing code that is very computationally demanding (or are writing a compiler yourself).

As far as your specific example here goes: GHC isn't going to significantly alter the intrinsic time complexity of your algorithm. It might be able to remove some constant factors. What it can't do is apply constant-factor improvements that it can't be sure are correct, particularly ones that technically change the meaning of the program in ways that you don't care about. Case in point here is @sclv's answer, which explains how your use of print creates unnecessary overhead; there's nothing GHC could do about that, and in fact the current form would possibly inhibit other optimizations.

Just a thought: as I know it is possible to produce a *C* source code from a Haskell, and I wonder — perhaps it would make a sense to do this, and next compile with GCC? GCC is permanently improving, also it relatively recently got so called *link-time optimizations*. — Hi-Angel, Feb 25 '15 at 13:22

sclv · Answer 2 · 2011-06-24T17:15:55.970

There's a conceptual problem here. Permutations is generating streaming permutations, and filter is streaming too. What's forcing everything prematurely is the "show" implicit in "print". Change your last line to:

mapM print $ pointlessQueens $ (read :: String -> Int) n

and you'll see that results are generated in a streaming fashion much more rapidly. That fixes, for large result sets, a potential space leak, and other than that just lets things be printed as computed rather than all at once at the end.

However, you shouldn't expect any order of magnitude improvements from ghc optimizations (there are a few, obvious ones that you do get, mostly having to do with strictness and folds, but its irritating to rely on them). What you'll get are constant factors, generally.

Edit: As luqui points out below, show is also streaming (or at least show of [Int] is), but the line buffering nonetheless makes it harder to see the genuine speed of computation...

You're saying it's just the line buffering forcing too much?! `show` is streaming too. — luqui, Jun 24 '11 at 16:40
great point. its not at all what i was expecting and im glad you brought it up — jon_darkstar, Jun 24 '11 at 18:38

HaskellElephant · Answer 3 · 2011-06-24T20:13:33.840

6

It should be noted, although you do express that it is not part of your question, that the big problem with your code is that you do not do any pruning.

In the case of your question, it feels foolish to talk about possible/impossible optimization, compiler flags, and how to best formulate it etc. when an improvement of the algorithm is staring us so blatantly in the face.

One of the first things that will be tried is the permutations starting with the first queen in position 1 and the second queen in position 2 ([1,2...]). This is of course not a solution and we will have to move one of the queens. However, in your implementation, all permutations involving this combination of the two first queens will be tested! The search should stop there and instantly move to the permutations involving [1,3,...].

Here is a version that does this sort of pruning:

import Data.List
import Control.Monad

main = getLine >>= mapM print . queens . read

queens :: Int -> [[Int]]
queens n = queens' [] n

queens' xs n 
 | length xs == n = return xs 
 | otherwise = do 
  x <- [1..n] \\ xs
  guard (validQueens (x:xs))
  queens' (x:xs) n

validQueens x = 
  and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]]

edited Jun 24 '11 at 20:13

answered Jun 24 '11 at 16:12

HaskellElephant

9,819
4
38
67

I mostly wanted to know if a compiler optimization on function composition with `filter` could automatically achieve some of this pruning. And i was certainly avoiding any shred of imperative logic, thinking this would jeopardize any such chances. My thinking was some combination of short circuit logic with partially constructed permutations might reject those obviously bad cases before construction completed (though probably not be able to learn to avoid all such cases). however, its looking pretty clear that i was a little too hopeful – jon_darkstar Jun 24 '11 at 18:53
i do really like how you implemented the pruning and lost the need for `permutations` all together. ill probably accept another bc you didn't tell me a whole lot about compiler optimization, but if question was "rewrite this code better" you nailed it – jon_darkstar Jun 24 '11 at 19:00
interesting how your `queens'` generates all those integer permutations with the guard condition serving as a filter at every step. looks like a pattern worth remembering – jon_darkstar Jun 24 '11 at 19:08
@jon_darkstar, I am aware that I didn't say so much about compiler optimization and that was the general theme, a shortcoming but I still feel that the answer contributes something. As to why filter will not do this pruning for you is because, as I am sure you have realized, although lazy evaluation avoids a lot of the checking it still has to be done n! times. The compiler cannot shorten your permutation list because of the principle that a function call should only be evaluated once inside each lambda expression could not be guaranteed. – HaskellElephant Jun 24 '11 at 20:10

George Co · Answer 4 · 2011-08-14T16:11:04.663

I understand that your question was about compiler optimization but as the discussion has shown pruning is necessary.

The first paper that I know of about how to do this for the n queens problem in a lazy functional language is Turner's paper "Recursion Equations as a Programming Language" You can read it in Google Books here.

In terms of your comment about a pattern worth remembering, this problem introduces a very powerful pattern. A great paper on this idea is Philip Wadler's paper, "How to Replace Failure by a List of Successes", which can be read in Google Books here

Here is a pure, non-monadic, implementation based on Turner's Miranda implementation. In the case of n = 12 (queens 12 12) it returns the first solution in .01 secs and will compute all 14,200 solutions in under 6 seconds. Of course printing those takes much longer.

queens :: Int -> Int -> [[Int]]
queens n boardsize = 
    queensi n 
        where
          -- given a safe arrangement  of queens in the first n - 1 rows,
          -- "queensi n" returns a list of all the safe arrangements of queens
          -- in the first n rows
          queensi :: Int -> [[Int]]
          queensi 0  = [[]]
          queensi n  = [ x : y | y <- queensi (n-1) , x <- [1..boardsize], safe x y 1]

-- "safe x y n" tests whether a queen at column x would be safe from previous
-- queens in y where the first element of y is n rows away from x, the second
-- element is (n+1) rows away from x, etc.
safe :: Int -> [Int] -> Int -> Bool
safe _ [] _ = True
safe x (c:y) n = and [ x /= c , x /= c + n , x /= c - n , safe x y (n+1)]
-- we only need to check for queens in the same column, and the same diagonals;
-- queens in the same row are not possible by the fact that we only pick one
-- queen per row

Extent of GHC's optimization

4 Answers4

Linked