Questions tagged [regular-language]

Regular language is a language which can be represented by a regular expression and thus every string in the language can be accepted by the corresponding deterministic finite automaton. Note: Regular Language should not be confused with Regular Expressions. For question regarding pattern matching within strings, use the [regex] tag instead.

Given an alphabet (finite set of symbols) Σ, a language is a set of all sequences of such symbols in that alphabet. A language is a regular language exactly when it can be expressed in terms of a (formal) regular expression and the membership of any string can be decided by a finite-state machine.

Regular languages belong to the highest hierarchy of the Chomsky Hierarchy, and are also called Type-3 grammars. They are above the Type-2 context-free languages which are recognized by pushdown automata, which are above the Type-1 context-sensitive languages recognized by linear bounded automata, and above the Type-0 recursively enumerable languages which can be recognized by Turing Machines. All regular languages are context-free, context-sensitive, and recursively enumerable. Formal regular expressions can be converted to deterministic finite state machines and to non deterministic finite machines and still represent the same regular language.

Please do not confuse this with regex. Most regex engines are far more expressive than formal regular expressions, finite state machines, and can represent non-regular languages.

Construction of a Regular Language

The set of all regular languages over a given alphabet Σ can be produced exactly by this process:

  • The empty language {}, rejecting all strings.
  • The language containing only the empty string ε
  • All languages containing only a single symbol s ∈ Σ.
  • Every language created by the union, concatenation, or kleene-star of regular languages. Suppose v and w are strings of a regular language A and B respectively:
    • The union (v|w) is also regular. It accepts languages that are in any of A or B.
    • The concatenation vw is also regular.
    • The kleene-star v* is also regular. It means any copies of strings in A concatenated, including 0.

Examples and Nonexamples of Regular Languages

  • Given a simple alphabet Σ = {0, 1}, where | represents union, * represents kleene-star, these formal regular expressions all represent represents a regular language:

    • The regular expression "0", "1", "(0|1)", "01", "11", "0*" are all regular.
    • The regular expression "(0(0|1)*1)", representing all binary strings beginning with 0 and ending with 1, is regular.
    • Given a regular expression R, the language "R+" and "R?" all represent a regular language, whereas + represents one or more, and ? represents zero or one. Namely, "R+" is equivalent to "RR*", and "R?" is equivalent to "(R|ε)".
    • Given a regular expression R, the language "R{m,n}" is regular for all natural m,n, where {m,n} represents "from m copies to n copies". This is because it also involves union and concatenation: "R{1,3}" is expanded to "(R|RR|RRR)".
  • Given an alphabet used by regex engines, usually an ASCII or Unicode alphabet containing all ASCII or Unicode characters respectively:

    • The regex /^.+$/ is regular. It includes all non-empty sequences of any character.
    • The regex /^#[A-Za-z]{1,3}[0-9]{2,4}$/ represents a regular language, consisting all strings which being with a hashtag, then one to three ASCII letters, followed by two to four decimal digits.
    • The regex /^([\d][\w])*$/ represents a regular language. It consists all strings which alternate digit characters and word characters. The shorthand \d and \w are examples of union.
  • Many regex engines are much more expressive than regular languages. Backreferences can cause a regex to represent a non-regular language, and consequently they cannot be decided by a finite state machine.

    • The regex "(.+)\1" represents an irregular language. Involving a backreference capturing the first group .+, it accepts all the sequences of uppercase Latin letters repeated exactly twice. They are called squares in formal language theory.
      • "ABCABC", "1234.1234." are accepted
      • "ABCAB", "1234567891234567890" are rejected.

Further Reading

914 questions
8
votes
3 answers

Combining deterministic finite automata

I'm really new to this stuff so I apologize for the noobishness here. construct a Deterministic Finite Automaton DFA recognizing the following language: L= { w : w has at least two a's and an odd number of b's}. The automate for each part of…
Haskell
  • 367
  • 3
  • 7
  • 15
7
votes
2 answers

Generative regular expressions

Typically in our work we use regular expressions in capture or match operations. However, regular expressions can be used - manually at least - to generate legal sentences that match the regular expression. Of course, some regular expressions can…
Paul Nathan
  • 39,638
  • 28
  • 112
  • 212
7
votes
4 answers

How does "δ:Q×Σ→Q" read in the definition of a DFA (deterministic finite automaton)?

How do you say δ: Q × Σ → Q in English? Describing what × and → mean would also help.
trusktr
  • 44,284
  • 53
  • 191
  • 263
7
votes
2 answers

Pumping lemma for regular language

I have a little confusion in checking whether the given language is regular or not using pumping lemma. Suppose we have to check whether: L. The language accepting even number of 0's in regular or not? We know that it is regular because we can…
7
votes
3 answers

Why L={wxw^R| w, x belongs to {a,b}^+ } is a regular language

Using pumping lemma, we can easily prove that the language L1 = {WcW^R|W ∈ {a,b}*} is not a regular language. (the alphabet is {a,b,c}; W^R represents the reverse string W) However, If we replace character c with "x"(x ∈ {a,b}+), say, L2 = {WxW^R|…
henry
  • 185
  • 2
  • 2
  • 13
6
votes
3 answers

How do I find the language from a regular expression?

How would I find the language for the following regular expressions over the alphabet {a, b}? aUb* (ab*Uc) ab*Ubc* a*bc*Uac EDIT: Before i get downvoted like crazy, i'd appreciate it if someone could show me the steps towards solving these…
user814447
6
votes
1 answer

Light regexp optimization

I have a regular expression that was the output of a computer program. It has things like (((2)|(9)))* which a human would undoubtedly write as [29]* So I'd like a program that can make simple transformations that make the regular expression more…
Charles
  • 11,269
  • 13
  • 67
  • 105
6
votes
2 answers

Is the union of two non-regular languages regular?

Given two non-regular languages, is their union regular? Also, why is L = L1 ∪ L2 = {aibj | i,j >= 0} the union of L1 = {aibj | i >= j} and L2 = {aibj | i < j}? Then, what is the union of L1 = {aibj | i > j} and L2 = {aibj | i < j}?
Belle
  • 119
  • 1
  • 9
6
votes
2 answers

Proving a Language to be regular

Pumping Lemma is used to prove a language to be not regular. But How a language can be proved to be regular ? In particular, Let L be a language. Define half(L) to be { x | for some y such that |x| = |y|, xy is in L}. Prove for each regular L…
Happy Mittal
  • 3,667
  • 12
  • 44
  • 60
6
votes
0 answers

Why regular expressions with backreferences are not regular expressions?

This article say: A backreference like \1 or \2 matches the string matched by a previous parenthesized expression, and only that string: (cat|dog)\1 matches catcat and dogdog but not catdog nor dogcat. As far as the theoretical term is…
macabeus
  • 4,156
  • 5
  • 37
  • 66
6
votes
2 answers

Are regular expressions (regex) really regular?

I understand how regular expressions got their name, and have read the related question (Why are regular expressions called "regular" expressions?), but am still wondering whether regular expressions are always regular. For example, how can…
6
votes
3 answers

Is it possible to have regexp that matches all valid regular expressions?

Is it possible to detect if a given string is valid regular expression, using just regular expressions? Say I have some strings, that may or may not be a valid regular expressions. I'd like to have a regular expression matches those string that…
Juha Syrjälä
  • 33,425
  • 31
  • 131
  • 183
6
votes
1 answer

The Definition of Regular Languages

I have tried, and burned my brain to understand the definition of Regular Languages in Discrete Mathematics and its Applications(Rosen) without reaching the goal of understanding why the definition is like that in this book. On page(789), I am…
Khaled Alshaya
  • 94,250
  • 39
  • 176
  • 234
6
votes
3 answers

generalizing the pumping lemma for UNIX-style regular expressions

Most UNIX regular expressions have, besides the usual **,+,?* operators a backslash operator where \1,\2,... match whatever's in the last parentheses, so for example *L=(a*)b\1* matches the (non regular) language *a^n b a^n*. On one hand, this seems…
Avi
  • 61
  • 1
6
votes
2 answers

Regular expression problem

What is the regular expression for the language 0m1n where m+n is even?
erasmus
  • 925
  • 2
  • 9
  • 16
1 2
3
60 61