Questions tagged [regular-language]

Regular language is a language which can be represented by a regular expression and thus every string in the language can be accepted by the corresponding deterministic finite automaton. Note: Regular Language should not be confused with Regular Expressions. For question regarding pattern matching within strings, use the [regex] tag instead.

Given an alphabet (finite set of symbols) Σ, a language is a set of all sequences of such symbols in that alphabet. A language is a regular language exactly when it can be expressed in terms of a (formal) regular expression and the membership of any string can be decided by a finite-state machine.

Regular languages belong to the highest hierarchy of the Chomsky Hierarchy, and are also called Type-3 grammars. They are above the Type-2 context-free languages which are recognized by pushdown automata, which are above the Type-1 context-sensitive languages recognized by linear bounded automata, and above the Type-0 recursively enumerable languages which can be recognized by Turing Machines. All regular languages are context-free, context-sensitive, and recursively enumerable. Formal regular expressions can be converted to deterministic finite state machines and to non deterministic finite machines and still represent the same regular language.

Please do not confuse this with regex. Most regex engines are far more expressive than formal regular expressions, finite state machines, and can represent non-regular languages.

Construction of a Regular Language

The set of all regular languages over a given alphabet Σ can be produced exactly by this process:

The empty language {}, rejecting all strings.
The language containing only the empty string ε
All languages containing only a single symbol s ∈ Σ.
Every language created by the union, concatenation, or kleene-star of regular languages. Suppose v and w are strings of a regular language A and B respectively:
- The union (v|w) is also regular. It accepts languages that are in any of A or B.
- The concatenation vw is also regular.
- The kleene-star v* is also regular. It means any copies of strings in A concatenated, including 0.

Examples and Nonexamples of Regular Languages

Given a simple alphabet Σ = {0, 1}, where | represents union, * represents kleene-star, these formal regular expressions all represent represents a regular language:
- The regular expression "0", "1", "(0|1)", "01", "11", "0*" are all regular.
- The regular expression "(0(0|1)*1)", representing all binary strings beginning with 0 and ending with 1, is regular.
- Given a regular expression R, the language "R+" and "R?" all represent a regular language, whereas + represents one or more, and ? represents zero or one. Namely, "R+" is equivalent to "RR*", and "R?" is equivalent to "(R|ε)".
- Given a regular expression R, the language "R{m,n}" is regular for all natural m,n, where {m,n} represents "from m copies to n copies". This is because it also involves union and concatenation: "R{1,3}" is expanded to "(R|RR|RRR)".
Given an alphabet used by regex engines, usually an ASCII or Unicode alphabet containing all ASCII or Unicode characters respectively:
- The regex /^.+$/ is regular. It includes all non-empty sequences of any character.
- The regex /^#[A-Za-z]{1,3}[0-9]{2,4}$/ represents a regular language, consisting all strings which being with a hashtag, then one to three ASCII letters, followed by two to four decimal digits.
- The regex /^([\d][\w])*$/ represents a regular language. It consists all strings which alternate digit characters and word characters. The shorthand \d and \w are examples of union.
Many regex engines are much more expressive than regular languages. Backreferences can cause a regex to represent a non-regular language, and consequently they cannot be decided by a finite state machine.
- The regex "(.+)\1" represents an irregular language. Involving a backreference capturing the first group .+, it accepts all the sequences of uppercase Latin letters repeated exactly twice. They are called squares in formal language theory.
  - "ABCABC", "1234.1234." are accepted
  - "ABCAB", "1234567891234567890" are rejected.

regex: Short for regular expression. Many regex engines nowadays are far more expressive than formal regular expressions and finite-state machines.
finite-state-machine: They are equivalent in expressiveness to formal regular expressions. They represent exactly the regular languages. Acronyms include FSM.
dfa: Deterministic finite automaton. nfa Nondeterministic finite automaton.
- Both are equivalent in expressiveness.
context-free-grammar, context-sensitive-grammar: Tags referring to lower levels of the Chomsky hierarchy
Wikipedia, includes explanation about squares and irregular regexes
What is a regular language?

914 questions

votes

1 answer

Proving a Certain Language Regular

In my computational theory class we have an assignment of proving a language is regular. The language is defined as: B = {1ky | y is in {0, 1}* and y contains at least k 1s, for k >= 1} This language looks to me like it would need a pushdown…

regex regular-language

asked Feb 26 '14 at 02:20

redLightening

votes

2 answers

How can I create this DCG in Prolog?

I want to create a DCG that languages like this get accepted: c bbbcbbb bbacbba abacaba aababacaababa As you can see this means that there is a specific order of a and b, then one c and then again the exact same order as before the c. If these…

prolog regular-language dcg

asked Feb 11 '14 at 13:36

Waylander

votes

5 answers

What is regularity?

This is more of a computer science question than a programming one, but I figure that this is the best place out of all the related sites to ask this. When I discovered Regular Expressions and looked up the term I assumed that this property of…

computer-science regular-language

asked Jan 09 '10 at 01:50

EpsilonVector

3,973
7
38
62

votes

1 answer

What is the concatenation of this language with itself?

Given the following language: L1 = { (ab)n | n ≥ 0 } That is, L1 = { ε ab, abab, ababab, abababab, ... } The question is to find what language L12 is. My guess is that it's equal to { (ab)2n | n ≥ 0 }. Is that correct? If so, how do I prove it?…

math regular-language finite-automata automata formal-languages

asked Oct 26 '13 at 06:05

Rouki

2,239
1
24
41

votes

4 answers

Is it possible to simplify this regular expression any further?

I'm working on some homework for my compiler class and I have the following problem: Write a regular expression for all strings of a's and b's that contain an odd number of a's or an odd number of b's (or both). After a lot of whiteboard work I came…

regex regular-language

asked Sep 16 '09 at 19:28

anon

votes

1 answer

can I shorten this regular expression using intersection?

I have this language L that contains only one string: written more concisely This string has 2(2^n−1) characters and I want to reduce it. I was thinking of using intersection, if i can find some regular languages in which the intersection of their…

javascript regular-language string-length

asked Dec 13 '12 at 19:07

Jiyda Moussa

votes

1 answer

Is there an efficient algorithm to decide whether the language accepted by one NFA is a superset of the language accepted by another?

Given two nondeterministic finite automata M1 and M2, is there an efficient algorithm to determine whether the language accepted by M1 is a superset of the language accepted by M2?

algorithm computer-science finite-automata regular-language nfa

asked Feb 25 '12 at 22:40

Daniel Trebbien

38,421
18
121
193

votes

1 answer

Regular Language (yes or no)

I was given the task to check whether this language is regular: L = {w∈{a,b,c}* | where the number of a is less than the number of b+c.} I can find neither a regular expression for this, nor a deterministic (or not) finite state automaton. On the…

regex regular-language automata

asked Dec 10 '11 at 10:11

Giannis Vl

votes

1 answer

Remove doctype containing entity from xml using java

I'm trying to process an xml, before that i need to remove the doctype and entity declaration from the input xml. I'm using the below code to remove the doctype and entity: fileContent = fileContent.replaceAll("",…

java regex xml string regular-language

asked Nov 16 '18 at 09:08

nithin

votes

1 answer

How to understand ATN graph generated for ANTLR grammar?

I have 2 simple lexer rules in my ANTLR4 grammar: fragment Attrs : '.' ARCH; fragment ARCH : 'IA32' | 'X64' | 'IPF' | 'EBC' | 'common'; The generated ATN with ANTLR4.7 is like this (Visual Studio Code): I searched some references about "ATN",…

antlr antlr4 state-machine regular-language automata

asked Aug 03 '17 at 01:25

smwikipedia

61,609
92
309
482

votes

4 answers

Regular expression matching on comma bounded by nonwhite space

I am trying to replace commas bounded by nonwhite space with a white space, while keeping other commas untouched (in R). Imagine I have: j<-"Abc,Abc, and c" and I want: "Abc Abc, and c" This almost works: gsub("[^ ],[^ ]"," " ,j) But it…

r regex regular-language

asked Mar 01 '17 at 12:39

tsutsume

votes

3 answers

Is concatenation of a non regular language with a regular language always not regular?

I'd like to know if the concatenation between two language (one regular and the other not) is always not regular or it may happen that the output is a regular language. Thanks.

computer-science context-free-grammar regular-language chomsky-normal-form

asked Nov 21 '16 at 13:46

RamsesXVII

votes

2 answers

RNN for binary classification of sequence

I wondering if someone can suggest a good library or reference (tutorial or article) to implement a Recurrent Neural Network (RNN). I tried to use the rnnlib by Alex Graves, but I had some troubles in changing the architecture to adapt the network…

deep-learning regular-language text-classification recurrent-neural-network

asked Nov 09 '16 at 18:11

G_Zak

votes

3 answers

Swedish SSN Regular expression reject users under a specific age

I Have a problem with my regular expression. I have made it possible to valdiate correct swedish social security number to match these criteria. YYMMDDNNNN YYMMDD-NNNN YYYYMMDDNNNN YYYYMMDD-NNNN But i would also like to reject a user to…

c# .net regex asp.net-mvc regular-language

asked Sep 17 '15 at 07:32

Andreas Jangefalk

votes

1 answer

Regular expression for [a-zA-Z]

I have a regular expression that matches English letters only, a [a-zA-Z] character class. Is there any built-in regular expression for that? I mean something like \s or \w.

regex regular-language

asked May 07 '15 at 20:12

bn12

Prev 1 2 3

…

60 61 Next

Questions tagged [regular-language]

Construction of a Regular Language

Examples and Nonexamples of Regular Languages

Further Reading