Questions tagged [regular-language]

Regular language is a language which can be represented by a regular expression and thus every string in the language can be accepted by the corresponding deterministic finite automaton. Note: Regular Language should not be confused with Regular Expressions. For question regarding pattern matching within strings, use the [regex] tag instead.

Given an alphabet (finite set of symbols) Σ, a language is a set of all sequences of such symbols in that alphabet. A language is a regular language exactly when it can be expressed in terms of a (formal) regular expression and the membership of any string can be decided by a finite-state machine.

Regular languages belong to the highest hierarchy of the Chomsky Hierarchy, and are also called Type-3 grammars. They are above the Type-2 context-free languages which are recognized by pushdown automata, which are above the Type-1 context-sensitive languages recognized by linear bounded automata, and above the Type-0 recursively enumerable languages which can be recognized by Turing Machines. All regular languages are context-free, context-sensitive, and recursively enumerable. Formal regular expressions can be converted to deterministic finite state machines and to non deterministic finite machines and still represent the same regular language.

Please do not confuse this with regex. Most regex engines are far more expressive than formal regular expressions, finite state machines, and can represent non-regular languages.

Construction of a Regular Language

The set of all regular languages over a given alphabet Σ can be produced exactly by this process:

  • The empty language {}, rejecting all strings.
  • The language containing only the empty string ε
  • All languages containing only a single symbol s ∈ Σ.
  • Every language created by the union, concatenation, or kleene-star of regular languages. Suppose v and w are strings of a regular language A and B respectively:
    • The union (v|w) is also regular. It accepts languages that are in any of A or B.
    • The concatenation vw is also regular.
    • The kleene-star v* is also regular. It means any copies of strings in A concatenated, including 0.

Examples and Nonexamples of Regular Languages

  • Given a simple alphabet Σ = {0, 1}, where | represents union, * represents kleene-star, these formal regular expressions all represent represents a regular language:

    • The regular expression "0", "1", "(0|1)", "01", "11", "0*" are all regular.
    • The regular expression "(0(0|1)*1)", representing all binary strings beginning with 0 and ending with 1, is regular.
    • Given a regular expression R, the language "R+" and "R?" all represent a regular language, whereas + represents one or more, and ? represents zero or one. Namely, "R+" is equivalent to "RR*", and "R?" is equivalent to "(R|ε)".
    • Given a regular expression R, the language "R{m,n}" is regular for all natural m,n, where {m,n} represents "from m copies to n copies". This is because it also involves union and concatenation: "R{1,3}" is expanded to "(R|RR|RRR)".
  • Given an alphabet used by regex engines, usually an ASCII or Unicode alphabet containing all ASCII or Unicode characters respectively:

    • The regex /^.+$/ is regular. It includes all non-empty sequences of any character.
    • The regex /^#[A-Za-z]{1,3}[0-9]{2,4}$/ represents a regular language, consisting all strings which being with a hashtag, then one to three ASCII letters, followed by two to four decimal digits.
    • The regex /^([\d][\w])*$/ represents a regular language. It consists all strings which alternate digit characters and word characters. The shorthand \d and \w are examples of union.
  • Many regex engines are much more expressive than regular languages. Backreferences can cause a regex to represent a non-regular language, and consequently they cannot be decided by a finite state machine.

    • The regex "(.+)\1" represents an irregular language. Involving a backreference capturing the first group .+, it accepts all the sequences of uppercase Latin letters repeated exactly twice. They are called squares in formal language theory.
      • "ABCABC", "1234.1234." are accepted
      • "ABCAB", "1234567891234567890" are rejected.

Further Reading

914 questions
4
votes
1 answer

Scheme, When to use Symbols instead of Strings?

I apologize in advance for my primitive english; i will try my best to avoid grammatical errors and such. Two weeks ago i decided to freshen my knowledge of Scheme (and its enlightnings) whilst implementing some math material i got between hands,…
Landau
  • 83
  • 1
  • 3
4
votes
4 answers

Regex for at least one alphabet and shouldn't allow dot(.)

I have written the regex below but I'm facing an issue: ^[^\.]*[a-zA-Z]+$ As per the above regex, df45543 is invalid, but I want to allow such a string. Only one alphabet character is mandatory and a dot is not allowed. All other characters are…
4
votes
2 answers

If L* is regular, then is L regular?

I've tried to look for the answer and I'm getting conflicting answers so I'm not sure. I know the reverse is true, that if L is regular then L* is regular under closure. I imagine that if L* is regular then L is regular because the subset of L*…
wzsun
  • 295
  • 2
  • 7
  • 15
4
votes
3 answers

Theory of Computation - Showing that a language is regular

I'm reviewing some notes for my course on Theory of Computation and I'm a little bit stuck on showing the following statement and I was hoping somebody could help me out with an explanation :) Let A be a regular language. The language B = {ab | a…
Tony
  • 107
  • 2
  • 6
4
votes
1 answer

Is the language to describe regular expressions regular itself?

If we describe regular expressions with operators *, | and concatenation . (which we simply omit for clarity), and parenthesis (, ) and some letters from some alphabet Sigma, then is the language that describes regular expressions itself regular? In…
4
votes
3 answers

concatenation & union - regular and context free languages

Given L1 context free non regular language. Given L2 regular language. Is it possible that L1 U L2 = regular language ? Also, is it possible that L1*L2 = regular language ? I think that the 2nd one is impossible. But I'm not sure. Would love to see…
Rouki
  • 2,239
  • 1
  • 24
  • 41
4
votes
1 answer

Union of two (irregular) Context Free Language results a Regular Language?

Given L1 and L2 (irregular) context free languages - Is it possible that L1 U L2 is regular? I know that it is possible but I just cant find an example showing that. Would love to get some assistance.
Rouki
  • 2,239
  • 1
  • 24
  • 41
4
votes
1 answer

how to intuitively think while Designing an NFA

I don't know whether this question is right to be asked, But I definitely felt that it should be asked. I did see a lot of nice and informative questions, articles on inter-net and on StackOverflow itself, of-course. But I found all the questions…
neerajDorle
  • 540
  • 7
  • 21
4
votes
2 answers

Determining if two languages are equal [Regular expression]

preparing for an exam and was going through this problem: Determine whether the set of strings represented by R1 is a subset of R2? R1 = (01 +10)* R2 = ((01)* + (10)*) My attempt: Since there represent the same expression I tried to prove…
Rave
  • 835
  • 4
  • 15
  • 28
4
votes
3 answers

Show that the following set over {a,b} is regular

Given the alphabet {a, b} we define Na(w) as the number of occurrences of a in the word w and similarly for Nb(w). Show that the following set over {a, b} is regular. A = {xy | Na(x) = Nb(y)} I'm having a hard time figuring out where to start…
4
votes
1 answer

Algorithm to convert regular expression to linear grammar

What is the Standard Algorithm to convert any given Regular Expression(RE) to a Left (or Right) Linear Grammar? I know I can do this like this (to write Linear Grammar from RE): RegEx -> NFA -> DFA -> Right Linear grammar. For a direct approach, I…
ChesterX
  • 187
  • 1
  • 3
  • 9
4
votes
2 answers

Ambiguity in transition: How to process string in NFA?

I have made DFA from a given regular expression to match the test string. There are some cases in which .* occurs. ( for example .*ab ) . Let say now the machine is in state 1. In the DFA, .* refers to the transition for all the characters to…
Tejas Joshi
  • 197
  • 3
  • 12
4
votes
1 answer

Which programming languages are regular?

The typical response to any "why isn't this regex working html!?!" question is "because HTML isn't a regular language". So, I was curious if anyone had a list of common programming languages which were regular languages, and thus are appropriate…
4
votes
1 answer

If every subset of a language L is regular then L is regular?

I know that converse of above theorem is not true i.e if L is regular then every subset of L need not be regular
akshay
  • 207
  • 1
  • 2
  • 7
4
votes
1 answer

Deciding if a given Language is Regular/Context-Free/Non Context-Free

I need some help with deciding if a given language is regular, context-free or not context-free. A brief, informal explanation is sufficient in the answer, hence no need to use pumping lemma. Lets say I have the following lanugages: L1 = { w ∈ {a,…