strictly positive vs ill-formed regular expressions in Coq

Question

We are a few people learning Coq and we are trying to define an Inductive predicate for the denotation of regular expressions, which represents a set of sequences. This seems to run into the strictly positive limitation, since we allow not as an operator. not is not usually included in regular expressions, but it is included in Brzozowski's regular expressions, which is the regular expressions we are looking at. When we try to redefine regular expressions using a Fixpoint, we run into the ill-formed limitation for the zero or more operator. We can overcome these problems by defining our regular expressions as a mix of Inductive predicates and a Fixpoint, but this feels wrong.

Is there any other way to define our regular expressions purely as an Inductive predicate?

Is there any problem with how we use a mix of Fixpoint and Inductive Predicate, or are we just being overly pure?

Here is the example code, with the explanations and expected errors in the comments:

Require Import List.
Import ListNotations.

(* We are defining our input alphabet for regular expressions as only two possible symbols *)
Inductive alphabet := a1 | a0.

Inductive regex :=
  (* emptyset matches absolutely no strings *)
  | emptyset : regex
  (* lambda matches only the empty string *)
  | lambda : regex
  (* symbol matches only strings of length 1 containing the exact alphabet symbol *)
  | symbol : alphabet -> regex
  (* concat is used to build of regular expressions that can match longer strings *)
  | concat : regex -> regex -> regex
  (* zero or more, as you are familiar with from regular expressions *)
  | star : regex -> regex
  (* `nor` is a boolean operator, here is the truth table
     P | Q | P `nor` Q
     -----------------
     T | T | F
     T | F | F
     F | T | F
     F | F | T
  *)
  | nor : regex -> regex -> regex
  .

(* We chose to include `nor`, since it can represent any possible boolean expression,
   which is one of the selling points of Brzozowski's derivatives for regular expressions.
*)

Definition complement (r: regex) : regex :=
  nor r r.

Definition and (r s: regex) : regex :=
  nor (nor r r) (nor s s).

Definition or (r s: regex) : regex :=
  nor (nor r s) (nor r s).

Definition xor (r s: regex) : regex :=
  or (and r (complement s)) (and (complement r) s).

(* I matches all strings *)
Definition I: regex :=
  complement (emptyset).

(*  A regular expression denotes a set of sequences. *)
Definition seq := (list alphabet).
Definition seqs := seq -> Prop.
Definition in_set_of_sequences (ss: seqs) (s: seq): Prop := ss s. 
Notation "p \in P" := (in_set_of_sequences P p) (at level 80).

(* Concatenation*. $(P.Q) = \{ s | s = p.q; p \in P, q \in Q \}$. *)
Inductive concat_seqs (P Q: seqs): seqs :=
  | mk_concat: forall (s: seq),
    (exists p q, p ++ q = s ->
      p \in P /\
      q \in Q
    ) ->
    concat_seqs P Q s
  .

(*
    *Star*. $P^{*} = \cup_{0}^{\infty} P^n$ , where $P^2 = P.P$, etc. 
    and $P^0 = \lambda$, the set consisting of the sequence of zero length.
*)
Inductive star_seqs (R: seqs): seqs :=
  | mk_star_zero : forall (s: seq),
    s = [] -> star_seqs R s
  | mk_star_more : forall (s: seq),
    s \in (concat_seqs R (star_seqs R)) ->
    star_seqs R s
  .

(*
    *Boolean function*. We shall denote any Boolean function of $P$ and $Q$ by $f(P, Q)$. 
    Of course, all the laws of Boolean algebra apply.
    `nor` is used to emulate `f`, since nor can be used to emulate all boolean functions.
*)
Inductive nor_seqs (P Q: seqs): seqs :=
  | mk_nor : forall s,
    ~(s \in P) /\ ~(s \in Q) ->
    nor_seqs P Q s
  .

(* Here we use a mix of Fixpoint and Inductive predicates to define the denotation of regular expressions.
   This works, but it would be nicer to define it purely as an Inductive predicate.
*)
Fixpoint denote_regex (r: regex): seqs :=
  match r with
  | emptyset => fun _ => False
  | lambda => fun xs => xs = []
  | symbol y => fun xs => xs = [y]
  | concat r1 r2 => concat_seqs (denote_regex r1) (denote_regex r2)
  | star r1 => star_seqs (denote_regex r1)
  | nor r1 r2 => nor_seqs (denote_regex r1) (denote_regex r2)
  end.

(* Here we try to rewrite the denotation of a regex using a pure inductive predicate, but we get an error:
   Non strictly positive occurrence of "ind_regex" in
    "forall (s : seq) (P Q : regex), 
    s \in nor_seqs (ind_regex P) (ind_regex Q) -> ind_regex (nor P Q) s".
*)
Inductive ind_regex: regex -> seqs :=
  | ind_emptyset (s: seq):
    False ->
    ind_regex emptyset s
  | ind_lambda (s: seq):
    s = [] ->
    ind_regex lambda s
  | ind_symbol (s: seq) (a: alphabet):
    s = [a] ->
    ind_regex (symbol a) s
  | ind_concat (s: seq) (P Q: regex):
    s \in (concat_seqs (ind_regex P) (ind_regex Q)) ->
    ind_regex (concat P Q) s
  | ind_star (s: seq) (R: regex):
    s \in (star_seqs (ind_regex R)) ->
    ind_regex (star R) s
  | ind_nor (s: seq) (P Q: regex):
    s \in (nor_seqs (ind_regex P) (ind_regex Q)) ->
    ind_regex (nor P Q) s
.


(*
    Here we try to define the denotation of a regex purely as a fixpoint, but we get an error:
    Recursive definition of fix_regex is ill-formed.
    In environment
    fix_regex : regex -> seqs
    r : regex
    s : regex
    xs : seq
    x : alphabet
    xs' : list alphabet
    ys : list alphabet
    zs : list alphabet
    Recursive call to fix_regex has principal argument equal to "star s" instead of "s".
    Recursive definition is:
    "fun r : regex =>
    match r with
    | emptyset => fun _ : seq => False
    | lambda => fun xs : seq => xs = []
    | symbol y => fun xs : seq => xs = [y]
    | concat s t => fun xs : seq => exists ys zs : list alphabet, xs = ys ++ zs /\ fix_regex s ys /\ fix_regex t zs
    | star s =>
        fun xs : seq =>
        match xs with
        | [] => True
        | x :: xs' => exists ys zs : list alphabet, xs' = ys ++ zs /\ fix_regex s (x :: ys) /\ fix_regex (star s) zs
        end
    | nor _ _ => fun _ : seq => True
    end".
*)
Fixpoint fix_regex (r: regex): seqs :=
  match r with
  | emptyset => fun _ => False
  | lambda => fun xs => xs = []
  | symbol y => fun xs => xs = [y]
  | concat s t => fun xs => exists ys zs, xs = ys ++ zs /\ fix_regex s ys /\ fix_regex t zs
  | star s => fun xs =>
    match xs with
    | [] => True
    | (x::xs') => exists ys zs, xs' = ys ++ zs /\ fix_regex s (x::ys) /\ fix_regex (star s) zs
    end
  | _ => fun _ => True
  end.

Kazuhiro Kobayashi · Accepted Answer · 2020-05-27T11:50:24.430

Is there any problem with how we use a mix of Fixpoint and Inductive Predicate

In my opinion, it's reasonable to mix inductive and fixpoint definitions. Your fix_regex depends on /\ operator, which is a notation of conj. And conj is indeed defined as an inductive type in the standard library. So is exists _, _, which is a notation of ex. I think defining and using star_seqs is as fair as using conj.

Is there any other way to define our regular expressions purely as an Inductive predicate?

Here I suggest some alternatives.

Mutually inductive types

You can define multiple inductive types that depend on each other.

Here is an (incomplete) example.

  Inductive match_regex : regex -> seq -> Prop  :=
  | match_lambda : match_regex lambda []
  | match_symbol : forall a, match_regex (symbol a) [a]
  | match_nor : forall r1 r2 s,
      unmatch_regex r1 s -> unmatch_regex r2 s -> match_regex (nor r1 r2) s
  with unmatch_regex : regex -> seq -> Prop :=
  | unmatch_lambda : forall x xs, unmatch_regex lambda (x :: xs)
  | unmatch_symbol : forall a b, a <> b -> unmatch_regex (symbol a) [b]
  | unmatch_nor_l : forall r1 r2 s,
      match_regex r1 s -> unmatch_regex (nor r1 r2) s
  | unmatch_nor_r : forall r1 r2 s,
      match_regex r2 s -> unmatch_regex (nor r1 r2) s
  .

Define a relationship between regex, seq, and bool.

When you use mutual inductive types, it can be complicated to write complementary conditions (such as match_lambda and unmatch_lambda in the above example).

This can be relieved by defining the proposition as a relation between regex, seq, and bool.

  Definition alpha_eq_dec : forall (x y : alphabet), {x = y} + {x <> y}.
    decide equality.
  Defined.
  Definition seq_eq_dec : forall (xs ys : seq), {xs = ys} + {xs <> ys} := list_eq_dec alpha_eq_dec.
  Definition seq_eqb (xs ys : seq) : bool :=
    if seq_eq_dec xs ys then true else false.

  Inductive bool_regex : regex -> seq -> bool -> Prop :=
  | bool_lambda : forall xs, bool_regex lambda xs (seq_eqb xs [])
  | bool_symbol : forall a xs, bool_regex (symbol a) xs (seq_eqb xs [a])
  | bool_nor : forall r1 r2 s b1 b2,
      bool_regex r1 s b1 -> bool_regex r2 s b2 -> bool_regex (nor r1 r2) s (negb (b1 || b2)).

Axiomize the predicate

Defining the predicate as a function can be tricky, if not impossible.

Define the requirements of the predicate as follows.

  Definition matchp_axiom (matchp : regex -> seq -> Prop) : Prop :=
    forall r s,
      matchp r s <->
      match r with
      | emptyset => False
      | lambda =>  s = []
      | symbol a => s = [a]
      (* and so on *)
      end.

And parametrize your statements.

  Section Facts.
    Variable matchp : regex -> seq -> Prop.
    Axiom matchp_spec : matchp_axiom matchp.

    Lemma star_repeat : forall a n, matchp (star (symbol a)) (repeat a n).
    ...
    Qed.
  End Facts.

You can't use simpl to reduce the predicate, but instead, you can use rewrite matchp_spec in similar tastes.

This can be combined with other methods by proving matchp_axiom match_regex or matchp_axiom (fun r s => bool_regex r s true)

Thank you so much, this is VERY helpful. Going to reply to various alternatives one by one. — Walter Schulze, May 27 '20 at 12:30
We have tried the mutually inductive definition over here https://github.com/awalterschulze/regex-reexamined-coq/blob/2ebcb8c5545af108e1428aeef4fa1748e231760e/src/original_brzozowski/original_brzozowski.v#L78 but it seems to require a lot of duplicate definitions for lots of proofs to be possible and when we tried to prove `{is_member r s} + {not_member r s}.` we got stuck. — Walter Schulze, May 27 '20 at 12:33
Your answer to the use a mix of Fixpoint and Inductive Predicate seems quite positive :) like we might be on the right direction. — Walter Schulze, May 27 '20 at 12:34
I was also thinking of defining `Inductive bool_regex : regex -> seq -> bool -> Prop :=`, but I avoided it, since there is this stigma to avoid `bool` in favour of `Prop`. — Walter Schulze, May 27 '20 at 12:36
I completed the definition of matchp_axiom. I am just putting it here for reference while thinking. ``` Definition matchp_axiom (matchp : regex -> seq -> Prop) : Prop := forall r xs, matchp r xs <-> match r with | emptyset => False | lambda => xs = [] | symbol a => xs = [a] | concat s t => exists ys zs, xs = ys ++ zs /\ matchp s ys /\ matchp t zs | star s => match xs with | [] => True | (x::xs') => exists ys zs, xs' = ys ++ zs /\ matchp s (x::ys) /\ matchp (star s) zs end | nor s t => ~(matchp s xs) /\ ~(matchp t xs) end. ``` — Walter Schulze, May 27 '20 at 13:08
Definitely still thinking about the axiom approach. You have given me a lot to think about. I think your point about /\, exists, etc already being inductive predicates is also very interesting. I want to thank you again for the very thorough answer. It is still very educational. — Walter Schulze, May 28 '20 at 11:33

score 1 · Answer 2 · answered May 27 '20 at 13:04

1

Actually, it is possible to define matching using a Fixpoint:

Fixpoint match_regex (re : regex) (s : list alphabet) : Prop :=
  match re with
  | emptyset       => False
  | lambda         => s = []
  | symbol x       => s = [x]
  | concat re1 re2 =>
    exists s1 s2, s = s1 ++ s2 /\ match_regex re1 s1 /\ match_regex re2 s2
  | star re' =>
    exists ss, s = List.concat ss /\ Forall (match_regex re') ss
  | nor re1 re2 => ~ (match_regex re1 s \/ match_regex re2 s)
  end.

answered May 27 '20 at 13:04

Arthur Azevedo De Amorim

23,012
3
33
39

Thank you so much. This is also very interesting. Kazuhiro Kobayashi made some interesting points that /\, exists and Forall are already inductive predicates. But I never thought of using List.concat in the definition. That is also very interesting, hmmm. – Walter Schulze May 28 '20 at 11:31

strictly positive vs ill-formed regular expressions in Coq

2 Answers2

Mutually inductive types

Define a relationship between regex, seq, and bool.

Axiomize the predicate