I haven't found any nice debugging utility for PCRE regexps. The more I could find was how to dump the bytecode:
$ pcretest -b
PCRE version 7.6 2008-01-28
re> /^((.)(?1)\2|.?)$/x
------------------------------------------------------------------
0 39 Bra
3 ^
4 26 CBra 1
9 6 CBra 2
14 Any
15 6 Ket
18 6 Once
21 4 Recurse
24 6 Ket
27 \2
30 5 Alt
33 Any?
35 31 Ket
38 $
39 39 Ket
42 End
------------------------------------------------------------------
Perl has better debugging tools than PCRE, try echo 123454321 | perl -Mre=debug -ne '/^((.)(?1)\2|.?)$/x'
. This gives not only some bytecode that is similar to PCRE's one, but it also shows each step, and the consumed and remaining parts of the input at each step:
Compiling REx "^((.)(?1)\2|.?)$"
Final program:
1: BOL (2)
2: OPEN1 (4)
4: BRANCH (15)
5: OPEN2 (7)
7: REG_ANY (8)
8: CLOSE2 (10)
10: GOSUB1[-8] (13)
13: REF2 (19)
15: BRANCH (FAIL)
16: CURLY {0,1} (19)
18: REG_ANY (0)
19: CLOSE1 (21)
21: EOL (22)
22: END (0)
floating ""$ at 0..2147483647 (checking floating) anchored(BOL) minlen 0
Guessing start of match in sv for REx "^((.)(?1)\2|.?)$" against "12321"
Found floating substr ""$ at offset 5...
Guessed: match at offset 0
Matching REx "^((.)(?1)\2|.?)$" against "12321"
0 <> <12321> | 1:BOL(2)
0 <> <12321> | 2:OPEN1(4)
0 <> <12321> | 4:BRANCH(15)
0 <> <12321> | 5: OPEN2(7)
0 <> <12321> | 7: REG_ANY(8)
1 <1> <2321> | 8: CLOSE2(10)
1 <1> <2321> | 10: GOSUB1[-8](13)
1 <1> <2321> | 2: OPEN1(4)
1 <1> <2321> | 4: BRANCH(15)
1 <1> <2321> | 5: OPEN2(7)
1 <1> <2321> | 7: REG_ANY(8)
2 <12> <321> | 8: CLOSE2(10)
2 <12> <321> | 10: GOSUB1[-8](13)
2 <12> <321> | 2: OPEN1(4)
2 <12> <321> | 4: BRANCH(15)
2 <12> <321> | 5: OPEN2(7)
2 <12> <321> | 7: REG_ANY(8)
3 <123> <21> | 8: CLOSE2(10)
3 <123> <21> | 10: GOSUB1[-8](13)
3 <123> <21> | 2: OPEN1(4)
3 <123> <21> | 4: BRANCH(15)
3 <123> <21> | 5: OPEN2(7)
3 <123> <21> | 7: REG_ANY(8)
4 <1232> <1> | 8: CLOSE2(10)
4 <1232> <1> | 10: GOSUB1[-8](13)
4 <1232> <1> | 2: OPEN1(4)
4 <1232> <1> | 4: BRANCH(15)
4 <1232> <1> | 5: OPEN2(7)
4 <1232> <1> | 7: REG_ANY(8)
5 <12321> <> | 8: CLOSE2(10)
5 <12321> <> | 10: GOSUB1[-8](13)
5 <12321> <> | 2: OPEN1(4)
5 <12321> <> | 4: BRANCH(15)
5 <12321> <> | 5: OPEN2(7)
5 <12321> <> | 7: REG_ANY(8)
failed...
5 <12321> <> | 15: BRANCH(19)
5 <12321> <> | 16: CURLY {0,1}(19)
REG_ANY can match 0 times out of 1...
5 <12321> <> | 19: CLOSE1(21)
EVAL trying tail ... 9d86dd8
5 <12321> <> | 13: REF2(19)
failed...
failed...
BRANCH failed...
4 <1232> <1> | 15: BRANCH(19)
4 <1232> <1> | 16: CURLY {0,1}(19)
REG_ANY can match 1 times out of 1...
5 <12321> <> | 19: CLOSE1(21)
EVAL trying tail ... 9d86d70
5 <12321> <> | 13: REF2(19)
failed...
4 <1232> <1> | 19: CLOSE1(21)
EVAL trying tail ... 9d86d70
4 <1232> <1> | 13: REF2(19)
failed...
failed...
BRANCH failed...
3 <123> <21> | 15: BRANCH(19)
3 <123> <21> | 16: CURLY {0,1}(19)
REG_ANY can match 1 times out of 1...
4 <1232> <1> | 19: CLOSE1(21)
EVAL trying tail ... 9d86d08
4 <1232> <1> | 13: REF2(19)
failed...
3 <123> <21> | 19: CLOSE1(21)
EVAL trying tail ... 9d86d08
3 <123> <21> | 13: REF2(19)
failed...
failed...
BRANCH failed...
2 <12> <321> | 15: BRANCH(19)
2 <12> <321> | 16: CURLY {0,1}(19)
REG_ANY can match 1 times out of 1...
3 <123> <21> | 19: CLOSE1(21)
EVAL trying tail ... 9d86ca0
3 <123> <21> | 13: REF2(19)
4 <1232> <1> | 19: CLOSE1(21)
EVAL trying tail ... 0
4 <1232> <1> | 13: REF2(19)
5 <12321> <> | 19: CLOSE1(21)
5 <12321> <> | 21: EOL(22)
5 <12321> <> | 22: END(0)
Match successful!
Freeing REx: "^((.)(?1)\2|.?)$"
As you can see, Perl first consumes all input recursing until (.)
fails, then starts backtracking and trying the second branch from the alternation .?
and the remainder of the first part \2
, when that fails it backtracks, until it finally succeeds.