4

The typical response to any "why isn't this regex working html!?!" question is "because HTML isn't a regular language".

So, I was curious if anyone had a list of common programming languages which were regular languages, and thus are appropriate for regular expression matching. I know there are ways to determine if a language is regular (case by case), but for complicated languages the proof can be also quite complicated. I thought perhaps a quick check list of languages could be useful.

I also know that you can do regular expressions with non-regular languages, but that they aren't always reliable (html example).

Nick Bartlett
  • 4,865
  • 2
  • 24
  • 37
  • Any programming language that is expressive enough to do "real" programming (i.e. Turing-complete) would be non-regular. – Lars Kotthoff Mar 04 '13 at 14:53
  • 4
    I'm not sure if this is a good question, as it's a list-of-X, but I don't see how it's not constructive. Whether a language is regular is a very well-defined property, so I can't imagine how "this question will likely solicit debate, arguments, polling, or extended discussion". –  Mar 04 '13 at 14:54
  • Are you looking for programming languages or for markup languages? – Bergi Mar 04 '13 at 14:54
  • sorry, I understand html is a markup language, not a programming language. I suppose I meant more that what, if any, languages (programming languages like C/Java/Perl/etc., query languages like SQL/SPARQL/etc., or even markup languages like HTML or stylesheet languages like CSS) – Nick Bartlett Mar 04 '13 at 15:00
  • 1
    possible duplicate of [Which programming languages have a regular grammar?](http://stackoverflow.com/questions/5621325/which-programming-languages-have-a-regular-grammar) – Oak Mar 04 '13 at 15:02
  • @LarsKotthoff, do not confuse the class of languages for which a recogniser can be built using a programming language with the class in which that language itself belongs. It is quite easy (though not particularly useful) to design a Turing-complete programming language that is regular. – ibid Mar 04 '13 at 15:27
  • I'm in particular thinking of nested structures (i.e. recursion) that cannot be recognised with a regular language. – Lars Kotthoff Mar 04 '13 at 16:10
  • @LarsKotthoff You don't need recursion for turing completeness. BF doesn't have functions, it just has a very primitive and restricted version of a while loop. And then there's [OISCs](http://esolangs.org/wiki/OISC) which don't even have explicit loops, just the same instruction with the same amount of scalar arguments again and again. –  Mar 04 '13 at 17:05

1 Answers1

8

Disregarding any arbitrary limits on nesting or programming length, I doubt any common programming languages are regular. Even simple (infix) arithmetical expressions form a non-regular language, and it is an unusual programming language that does not support them. More generally, if a language allows nesting of any construct without limiting its depth, it is not a regular language.

ibid
  • 3,891
  • 22
  • 17
  • How are infix operators non-regular? `\d+([*/+-]\d+)*` is a regular language, isn't it? – Bergi Mar 05 '13 at 13:39
  • 4
    I was thinking of parentheses. While they are not strictly required, they are usually admitted. Any language requiring parenthesis balancing is nonregular. – ibid Mar 06 '13 at 14:45