Are .NET's regular expressions Turing complete?

Question

Regular expressions are often pointed to as the classical example of a language that is not Turning complete. For example "regular expressions" is given in as the answer to this SO question looking for languages that are not Turing complete.

In my, perhaps somewhat basic, understanding of the notion of Turning completeness, this means that regular expressions cannot be used check for patterns that are "balanced". Balanced meaning have an equal number of opening characters as closing characters. This is because to do this would require you to have some kind of state, to allow you to match the opening and closing characters.

However the .NET implementation of regular expressions introduces the notion of a balanced group. This construct is designed to let you backtrack and see if a previous group was matched. This means that a .NET regular expressions:

^(?<p>a)*(?<-p>b)*(?(p)(?!))$

Could match a pattern that:

ab
aabb
aaabbb
aaaabbbb
... etc. ...

Does this means .NET's regular expressions are Turing complete? Or are there other things that are missing that would be required for the language to be Turing complete?

Yes, the definition of [Turing Completeness](http://en.wikipedia.org/wiki/Turing_completeness) ;-) — , Jan 29 '11 at 07:35
Okay, I got a bit over excited about this one, what it actually means they can match languages that are non-regular, which according to Wikipedia is not that uncommon for regular expression syntax. "'Regular expressions' [...] are only marginally related to real regular expression": http://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages — Robert, Jan 29 '11 at 07:47
.NET is very powerful indeed. It even has _variable length_ lookbehind. — ridgerunner, Mar 16 '11 at 04:11
Variable length lookbehind and lookahead add no power to regular expressions. — porges, Mar 16 '11 at 04:21
@Porges - Care to explain? I can't prove there's anything you can't do without variable-length lookarounds, but it certainly seems that way. For example, testing other parts of the input string without causing matches to overlap. — Justin Morgan - On strike, Jun 14 '12 at 17:02
@JustinMorgan: talking about the *power* of a language usually means considering what kind of an automaton you need to match the language against a string, and a regex with lookahead/lookbehind doesn't require anything more than what a normal regex requires - a DFA or NFA. You can always rewrite a regex with lookarounds into one without lookarounds, but it might grow a lot -- regexes with lookaround are more *expressive* than normal regexes, in that they allow you to say a lot more with a lot less. (Match extraction is another thing - you can't get the same results as "(?<=a)b" without them.) — porges, Jun 15 '12 at 02:13
@Porges - Great info, thanks for going into more detail. I've often wondered whether there are really patterns that are theoretically possible to describe with lookarounds, but impossible without them. — Justin Morgan - On strike, Jun 15 '12 at 03:44

ThomasMcLeod · Accepted Answer · 2013-08-12T04:25:46.407

In computation theory, a regular expression describes a regular language. The class of regular languages is precisely those languages that can be recognized by some finite state machine, or generated by a regular grammar. However, the example you have described (balanced phrases) is not a regular language and cannot be recognized by a finite state machine or generated by a regular grammar. In fact, this a textbook example of what is called a context-free language. These require a push-down automata for recognition. The class of context-free languages is a superset of regular languages but a proper subset of turing complete languages. The syntax (as opposed to semantics) of most programming languages is a context-free language. If you are interested in learning more about this topic, you can start with the Chomsky hierarchy

score 5 · Answer 2 · answered Dec 17 '11 at 09:50

5

Regex's in .NET are not Turing complete because they always halt. This cannot be said of a general turing machine.

answered Dec 17 '11 at 09:50

usr

168,620
35
240
369

Hmmmmm… really? I did encounter some, which were executing forever. – Display Name Dec 15 '14 at 18:26
I believe a reasonably stupid engine could even go infinite with `a*+`. – CalculatorFeline Jun 20 '17 at 00:14
@SargeBorsch that sounds like a regex that has exponential cost in the size of the input. There are some well-known and rather simple patterns to cause this. The .NET regex engine is not able to deal with that (some other engines are). The required optimization for that was not implemented. I once looked into the engine internals and it seems like a screw-up to me based on the approach taken and the code. – usr Jun 23 '17 at 12:28

score 4 · Answer 3 · edited Jan 29 '11 at 07:20

4

You pretty much miss the definition of turing complete.

Turing completeness, named after Alan Turing, is significant in that every plausible design for a computing device so far advanced can be emulated by a universal Turing machine — an observation that has become known as the Church-Turing thesis. Thus, a machine that can act as a universal Turing machine can, in principle, perform any calculation that any other programmable computer is capable of. However, this has nothing to do with the effort required to write a program for the machine, the time it may take for the machine to perform the calculation, or any abilities the machine may possess that are unrelated to computation.

Now, you can not do certain things in regular expressions, so the langauge is not turing complete.

You really have to use the same definition like everyone else, you know. Limited understanding should trigger finding out the truth.

edited Jan 29 '11 at 07:20

Jim Burger

4,399
1
24
27

answered Jan 29 '11 at 07:17

TomTom

61,059
10
88
148

Right. Not even remotely "turing complete". You can't even calculate 1+1 using a regex. – Jan 29 '11 at 07:19
4

Thank you kindly for your answer, your right I was confusing the counter example of why regex's aren't Turing complete with the actual definition of turning completeness. In my defence asking questions is the way you learn stuff. – Robert Jan 29 '11 at 07:28
2

@Robert, right. The only stupid question is the one that gets you divorced or thown in the desert naked and with no water. – Jan 29 '11 at 22:53
@user47589 Actually, you can do addition if you're using a unitary system. You'd just have to remove `/(1+)-\1/`. Here's an interesting article that describes how to detect a prime number using regular expressions: https://iluxonchik.github.io/regular-expression-check-if-number-is-prime/ – ElliotThomas Jun 17 '22 at 16:12

score 3 · Answer 4 · answered Mar 16 '11 at 03:57

@Inuyasha: Actually you can do addition with a regular expression. Well at least check if the computation is done correctly. Only thing is you have to give the input to the regex in a strange order (you can't reverse a string (or check if it is reversed) with a regex).

The pattern is:

abc
def
---
ghi

=> cfi beh adg

Suppose you want to add 1011 an 0110 in binary:

01011
00110
-----
10001


=> 101 110 010 100 001

If you give this input in the order of lease significant bits to greatest, interspersing the first operand, second operand, and the output, you would get the string 101110010100001. This can be matched by

((000|011|101)|(110(010|100|111)*001))*

Which is a garden variety regex. You could extend this to decimal addition but the regex would get crazy complicated.

Are .NET's regular expressions Turing complete?

4 Answers4