5

I'm playing with a little formalisation in Idris and I'm having some strange behaviour: high compilation time and CPU usage for a function.

The code is an regex pattern matching algorithm. First the regex definition:

data RegExp : Type where
  Zero : RegExp
  Eps  : RegExp
  Chr  : Char -> RegExp
  Cat  : RegExp -> RegExp -> RegExp
  Alt  : RegExp -> RegExp -> RegExp
  Star : RegExp -> RegExp
  Comp : RegExp -> RegExp

Regex membership and non-membership are defined as the following mutually recursive data types:

mutual      
  data NotInRegExp : List Char -> RegExp -> Type where
    NotInZero : NotInRegExp xs Zero
    NotInEps  : Not (xs = []) -> NotInRegExp xs Eps
    NotInChr  : Not (xs = [ c ]) -> NotInRegExp xs (Chr c)
    NotInCat  : zs = xs ++ ys -> (Either (NotInRegExp xs l) 
                                         ((InRegExp xs l)
                                         ,(NotInRegExp ys r)))
                              -> NotInRegExp zs (Cat l r)
    NotInAlt  : NotInRegExp xs l -> NotInRegExp xs r -> NotInRegExp xs (Alt l r)   
    NotInStar : NotInRegExp xs Eps ->
                NotInRegExp xs (Cat e (Star e)) ->
                NotInRegExp xs (Star e)
    NotInComp : InRegExp xs e -> NotInRegExp xs (Comp e)                

  data InRegExp : List Char -> RegExp -> Type where
    InEps : InRegExp [] Eps
    InChr : InRegExp [ a ] (Chr a)
    InCat : InRegExp xs l ->
            InRegExp ys r ->
            zs = xs ++ ys ->
            InRegExp zs (Cat l r)
    InAltL : InRegExp xs l ->
             InRegExp xs (Alt l r)
    InAltR : InRegExp xs r ->
             InRegExp xs (Alt l r)
    InStar : InRegExp xs (Alt Eps (Cat e (Star e))) ->
             InRegExp xs (Star e)
    InComp : NotInRegExp xs e -> InRegExp xs (Comp e)

After these rather long definitions, I define a smart constructor for alternatives:

 infixl 4 .|.

 (.|.) : RegExp -> RegExp -> RegExp
 Zero .|. e = e
 e .|. Zero = e
 e .|. e'   = Alt e e'

Now, I want to prove that this smart constructor is sound and complete with respect to regex membership semantics. The proofs are almost straightforward induction / case analysis. But, one of these proofs is demanding a lot of time and CPU to compile (around 90% of CPU in Mac OS X El Capitan).

The offending function is:

 altOptNotInComplete : NotInRegExp xs (Alt l r) -> NotInRegExp xs (l .|. r)
 altOptNotInComplete {l = Zero} (NotInAlt x y) = y
 altOptNotInComplete {l = Eps}{r = Zero} (NotInAlt x y) = x
 altOptNotInComplete {l = Eps}{r = Eps} pr = pr
 altOptNotInComplete {l = Eps}{r = (Chr x)} pr = pr
 altOptNotInComplete {l = Eps}{r = (Cat x y)} pr = pr
 altOptNotInComplete {l = Eps}{r = (Alt x y)} pr = pr
 altOptNotInComplete {l = Eps}{r = (Star x)} pr = pr
 altOptNotInComplete {l = Eps}{r = (Comp x)} pr = pr
 altOptNotInComplete {l = (Chr x)}{r = Zero} (NotInAlt y z) = y
 altOptNotInComplete {l = (Chr x)}{r = Eps} pr = pr
 altOptNotInComplete {l = (Chr x)}{r = (Chr y)} pr = pr
 altOptNotInComplete {l = (Chr x)}{r = (Cat y z)} pr = pr
 altOptNotInComplete {l = (Chr x)}{r = (Alt y z)} pr = pr
 altOptNotInComplete {l = (Chr x)}{r = (Star y)} pr = pr
 altOptNotInComplete {l = (Chr x)}{r = (Comp y)} pr = pr
 altOptNotInComplete {l = (Cat x y)}{r = Zero} (NotInAlt z w) = z
 altOptNotInComplete {l = (Cat x y)}{r = Eps} pr = pr
 altOptNotInComplete {l = (Cat x y)}{r = (Chr z)} pr = pr
 altOptNotInComplete {l = (Cat x y)}{r = (Cat z w)} pr = pr
 altOptNotInComplete {l = (Cat x y)}{r = (Alt z w)} pr = pr
 altOptNotInComplete {l = (Cat x y)}{r = (Star z)} pr = pr
 altOptNotInComplete {l = (Cat x y)}{r = (Comp z)} pr = pr
 altOptNotInComplete {l = (Alt x y)}{r = Zero} (NotInAlt z w) = z
 altOptNotInComplete {l = (Alt x y)}{r = Eps} pr = pr
 altOptNotInComplete {l = (Alt x y)}{r = (Chr z)} pr = pr
 altOptNotInComplete {l = (Alt x y)}{r = (Cat z w)} pr = pr
 altOptNotInComplete {l = (Alt x y)}{r = (Alt z w)} pr = pr
 altOptNotInComplete {l = (Alt x y)}{r = (Star z)} pr = pr
 altOptNotInComplete {l = (Alt x y)}{r = (Comp z)} pr = pr
 altOptNotInComplete {l = (Star x)}{r = Zero} (NotInAlt y z) = y
 altOptNotInComplete {l = (Star x)}{r = Eps} pr = pr
 altOptNotInComplete {l = (Star x)}{r = (Chr y)} pr = pr
 altOptNotInComplete {l = (Star x)}{r = (Cat y z)} pr = pr
 altOptNotInComplete {l = (Star x)}{r = (Alt y z)} pr = pr
 altOptNotInComplete {l = (Star x)}{r = (Star y)} pr = pr
 altOptNotInComplete {l = (Star x)}{r = (Comp y)} pr = pr
 altOptNotInComplete {l = (Comp x)}{r = Zero} (NotInAlt y z) = y
 altOptNotInComplete {l = (Comp x)}{r = Eps} pr = pr
 altOptNotInComplete {l = (Comp x)}{r = (Chr y)} pr = pr
 altOptNotInComplete {l = (Comp x)}{r = (Cat y z)} pr = pr
 altOptNotInComplete {l = (Comp x)}{r = (Alt y z)} pr = pr
 altOptNotInComplete {l = (Comp x)}{r = (Star y)} pr = pr
 altOptNotInComplete {l = (Comp x)}{r = (Comp y)} pr = pr

I can't understand why this function is demanding so much CPU. Is there a way to "optimize" this code in order that compilation behaves normally?

The previous code is available at the following gist. I'm using Idris 0.10 on Mac Os X El Capitan.

Any clue is highly welcome.

SegFault
  • 2,526
  • 4
  • 21
  • 41
Rodrigo Ribeiro
  • 3,198
  • 1
  • 18
  • 26

0 Answers0