Questions tagged [regular-language]

Regular language is a language which can be represented by a regular expression and thus every string in the language can be accepted by the corresponding deterministic finite automaton. Note: Regular Language should not be confused with Regular Expressions. For question regarding pattern matching within strings, use the [regex] tag instead.

Given an alphabet (finite set of symbols) Σ, a language is a set of all sequences of such symbols in that alphabet. A language is a regular language exactly when it can be expressed in terms of a (formal) regular expression and the membership of any string can be decided by a finite-state machine.

Regular languages belong to the highest hierarchy of the Chomsky Hierarchy, and are also called Type-3 grammars. They are above the Type-2 context-free languages which are recognized by pushdown automata, which are above the Type-1 context-sensitive languages recognized by linear bounded automata, and above the Type-0 recursively enumerable languages which can be recognized by Turing Machines. All regular languages are context-free, context-sensitive, and recursively enumerable. Formal regular expressions can be converted to deterministic finite state machines and to non deterministic finite machines and still represent the same regular language.

Please do not confuse this with regex. Most regex engines are far more expressive than formal regular expressions, finite state machines, and can represent non-regular languages.

Construction of a Regular Language

The set of all regular languages over a given alphabet Σ can be produced exactly by this process:

  • The empty language {}, rejecting all strings.
  • The language containing only the empty string ε
  • All languages containing only a single symbol s ∈ Σ.
  • Every language created by the union, concatenation, or kleene-star of regular languages. Suppose v and w are strings of a regular language A and B respectively:
    • The union (v|w) is also regular. It accepts languages that are in any of A or B.
    • The concatenation vw is also regular.
    • The kleene-star v* is also regular. It means any copies of strings in A concatenated, including 0.

Examples and Nonexamples of Regular Languages

  • Given a simple alphabet Σ = {0, 1}, where | represents union, * represents kleene-star, these formal regular expressions all represent represents a regular language:

    • The regular expression "0", "1", "(0|1)", "01", "11", "0*" are all regular.
    • The regular expression "(0(0|1)*1)", representing all binary strings beginning with 0 and ending with 1, is regular.
    • Given a regular expression R, the language "R+" and "R?" all represent a regular language, whereas + represents one or more, and ? represents zero or one. Namely, "R+" is equivalent to "RR*", and "R?" is equivalent to "(R|ε)".
    • Given a regular expression R, the language "R{m,n}" is regular for all natural m,n, where {m,n} represents "from m copies to n copies". This is because it also involves union and concatenation: "R{1,3}" is expanded to "(R|RR|RRR)".
  • Given an alphabet used by regex engines, usually an ASCII or Unicode alphabet containing all ASCII or Unicode characters respectively:

    • The regex /^.+$/ is regular. It includes all non-empty sequences of any character.
    • The regex /^#[A-Za-z]{1,3}[0-9]{2,4}$/ represents a regular language, consisting all strings which being with a hashtag, then one to three ASCII letters, followed by two to four decimal digits.
    • The regex /^([\d][\w])*$/ represents a regular language. It consists all strings which alternate digit characters and word characters. The shorthand \d and \w are examples of union.
  • Many regex engines are much more expressive than regular languages. Backreferences can cause a regex to represent a non-regular language, and consequently they cannot be decided by a finite state machine.

    • The regex "(.+)\1" represents an irregular language. Involving a backreference capturing the first group .+, it accepts all the sequences of uppercase Latin letters repeated exactly twice. They are called squares in formal language theory.
      • "ABCABC", "1234.1234." are accepted
      • "ABCAB", "1234567891234567890" are rejected.

Further Reading

914 questions
3
votes
1 answer

Are fluent interfaces described by context free or regular grammars?

I'm toying around with fluent interfaces in the style of Martin Fowlers text, and I'm wondering if the grammar they are describing is context free or regular? I'm talking about interfaces such as this: var car = new…
Dervall
  • 5,736
  • 3
  • 25
  • 48
3
votes
3 answers

Regular expression to replace shortest match

my string is like this sfdfdsfdsfstart112matlab2336endgfdgdfgkknfkgstart558899enddfdsfd how can we replace part of a string such a way that the result will be sfdfdsfdsfgfdgdfgkknfkgdfdsfd i.e bolded content need to be removed.
Sai Mukesh
  • 401
  • 4
  • 11
3
votes
3 answers

Javascript Capitalize first letter of each word ignore Contractions

I am trying to Capitalize the first letter of each word in a string. I found similar questions online but none seem to answer my question of ignoring Contractions like can't, won't, wasn't. This snippet of code works but it also capitalizes the…
Shayne
  • 33
  • 3
3
votes
1 answer

Closure properties of context-free languages and intersection with regular languages

The intersection of a context-free language and a regular language is always context-free but context-free languages are not closed under set intersection. Could anyone explain why both theorems are true if all regular languages are context-free…
methane
  • 667
  • 2
  • 8
  • 17
3
votes
0 answers

BNF rule to regular expression

I'm looking for a way to find out whether a specific rule in a BNF grammar can be converted to a regular expression. (With "regular expression" (RE), I mean the simple mathematical kind. I'm not interested in BNF rules that can only be done with the…
3
votes
2 answers

Regular expression - Python [list query]

I am trying to write a regular expression for this list: data= ["Fred is Deputy Manager. He is working for MNC.", "Rita is another employee in AC Corp."] And I want to delete all the words that starts with an uppercase letter but it should not check…
Sanya
  • 129
  • 1
  • 6
3
votes
1 answer

Convert context free grammar to regular grammar

I wonder how to design this regular grammar, or how to convert my context free grammar to regular grammar (like A->aA).I tried but no result for this. Question: The set of strings on Σ ={a,b}which contain at least two occurrences aaa, and at least…
NormalSL
  • 61
  • 1
  • 6
3
votes
1 answer

Regex to Detect Zalgo

I'm creating a message filtering system, that detects z͎͗ͣḁ̵̑l̉̃ͦg̐̓̒o͓̔ͥ. My current regex is /([^\u0009-\u02b7\u2000-\u20bf\u2122\u0308]|(?![^aeiouy])\u0308)/gm but this also captures emojis. The regex should filter all w̵̢̃ë̸̩́ị̵̽r̴̺̆d̴̘̕…
ADAMJR
  • 1,880
  • 1
  • 14
  • 34
3
votes
2 answers

having trouble with Cyrillic characters

I'm trying to use the standard library to match some Cyrillic words: // This is a UTF-8 file. std::locale::global(std::locale("en_US.UTF-8")); string s {"Каждый охотник желает знать где сидит фазан."}; regex re {"[А-Яа-яЁё]+"}; …
undercat
  • 529
  • 5
  • 17
3
votes
1 answer

Inserting a regular language into other regular language

Let L1 and L2 be the regular languages over the alphabet {a,b}. We define the language L3 as follows: L3 = {pqr | pr ∈ L1, q ∈ L2} L3 is obtained by inserting a string from L2 inside a string from L1. Is language L3 is still regular or not? I am…
3
votes
2 answers

Does order not matter in regular expressions?

I was looking at the question posed in this stackoverflow link (Regular expression for odd number of a's) for which it is asked to find the regular expression for strings that have odd number of a over Σ = {a,b}. The answer given by the top comment…
agreatkid
  • 93
  • 2
  • 4
3
votes
3 answers

Infer regex pattern from set of Strings, I need an algorithm in java to create below information

I wanted to convert sets of strings to regular expression using java. I searched many things for it but there was no such satisfying answer available on the internet which resolves my issue. so I prefer to ask here. First is it possible to convert…
Sabaoon Bedar
  • 3,113
  • 2
  • 31
  • 37
3
votes
1 answer

Is there an algorithm for determining if the set of all valid XML instances in respect with a specific XSD schema is a regular language or not?

Essentially I want to know if a specific XSD schema can be replaced by a regular expression or not. I know that XML Schema language can produce XSDs whose set of valid XML instances can be of any type of language (even context-sensitive). I want to…
Paralife
  • 6,116
  • 8
  • 38
  • 64
3
votes
1 answer

Proving concatenation of language is associative in Agda

I am new to the language Agda, and I am working on formal languages using Agda. I've got some problems when proving the concatenation of languages is associative. The proof will be yellow highlighted as Agda could not find the words for "++Assoc" in…
3
votes
2 answers

re.search() in python goes into an infinite loop

I'm trying to extract file paths (Windows/Ubuntu, relative/absolute) from a text document. The regular expression code below is used check if a word is a file path or not. It works for most of the cases but fails for one case, where it goes into an…
Hasmukh
  • 33
  • 5