Questions tagged [regular-language]

Regular language is a language which can be represented by a regular expression and thus every string in the language can be accepted by the corresponding deterministic finite automaton. Note: Regular Language should not be confused with Regular Expressions. For question regarding pattern matching within strings, use the [regex] tag instead.

Given an alphabet (finite set of symbols) Σ, a language is a set of all sequences of such symbols in that alphabet. A language is a regular language exactly when it can be expressed in terms of a (formal) regular expression and the membership of any string can be decided by a finite-state machine.

Regular languages belong to the highest hierarchy of the Chomsky Hierarchy, and are also called Type-3 grammars. They are above the Type-2 context-free languages which are recognized by pushdown automata, which are above the Type-1 context-sensitive languages recognized by linear bounded automata, and above the Type-0 recursively enumerable languages which can be recognized by Turing Machines. All regular languages are context-free, context-sensitive, and recursively enumerable. Formal regular expressions can be converted to deterministic finite state machines and to non deterministic finite machines and still represent the same regular language.

Please do not confuse this with regex. Most regex engines are far more expressive than formal regular expressions, finite state machines, and can represent non-regular languages.

Construction of a Regular Language

The set of all regular languages over a given alphabet Σ can be produced exactly by this process:

  • The empty language {}, rejecting all strings.
  • The language containing only the empty string ε
  • All languages containing only a single symbol s ∈ Σ.
  • Every language created by the union, concatenation, or kleene-star of regular languages. Suppose v and w are strings of a regular language A and B respectively:
    • The union (v|w) is also regular. It accepts languages that are in any of A or B.
    • The concatenation vw is also regular.
    • The kleene-star v* is also regular. It means any copies of strings in A concatenated, including 0.

Examples and Nonexamples of Regular Languages

  • Given a simple alphabet Σ = {0, 1}, where | represents union, * represents kleene-star, these formal regular expressions all represent represents a regular language:

    • The regular expression "0", "1", "(0|1)", "01", "11", "0*" are all regular.
    • The regular expression "(0(0|1)*1)", representing all binary strings beginning with 0 and ending with 1, is regular.
    • Given a regular expression R, the language "R+" and "R?" all represent a regular language, whereas + represents one or more, and ? represents zero or one. Namely, "R+" is equivalent to "RR*", and "R?" is equivalent to "(R|ε)".
    • Given a regular expression R, the language "R{m,n}" is regular for all natural m,n, where {m,n} represents "from m copies to n copies". This is because it also involves union and concatenation: "R{1,3}" is expanded to "(R|RR|RRR)".
  • Given an alphabet used by regex engines, usually an ASCII or Unicode alphabet containing all ASCII or Unicode characters respectively:

    • The regex /^.+$/ is regular. It includes all non-empty sequences of any character.
    • The regex /^#[A-Za-z]{1,3}[0-9]{2,4}$/ represents a regular language, consisting all strings which being with a hashtag, then one to three ASCII letters, followed by two to four decimal digits.
    • The regex /^([\d][\w])*$/ represents a regular language. It consists all strings which alternate digit characters and word characters. The shorthand \d and \w are examples of union.
  • Many regex engines are much more expressive than regular languages. Backreferences can cause a regex to represent a non-regular language, and consequently they cannot be decided by a finite state machine.

    • The regex "(.+)\1" represents an irregular language. Involving a backreference capturing the first group .+, it accepts all the sequences of uppercase Latin letters repeated exactly twice. They are called squares in formal language theory.
      • "ABCABC", "1234.1234." are accepted
      • "ABCAB", "1234567891234567890" are rejected.

Further Reading

914 questions
-2
votes
1 answer

In Regular Expression, how to make an alternative to "?" matching by only using union (+) and closure(*) quantifiers?

I would like ask if is it possible to represent "?" quantifier using only union (+) and closure(*) quantifiers. For example, "a+" can also represented as "a(a*)". How can you represent "a?" with only "*"s and "+"s?
-2
votes
1 answer

Simple regular expression(Pattern) for a decimal with a precision of 2

I have number input text box.It has 2 digit precision value. My requirement is that it should accept only 15 digit number along with precision.I have tried with several regular expression but haven't worked. Could you please suggest regular…
-2
votes
2 answers

How to convert regular expression to finite state machine?

Let the regular expression; r = (a*|(ab)*)b* what is the rules for converting this expression to finite state machine?
-2
votes
2 answers

Match Regular Expression with URL

Dears, I've tried "\b \b" and (.*)["] to get the regex which lets me select the url starting with "my.website" and ends with "myfile" for this URL in source code:
M. A.
  • 424
  • 6
  • 21
-2
votes
1 answer

Regular Expression matching empty string before a character in c#

I am trying to match a string containing nothing before a character. Consider the following lines: 'this is a valid line 'this is also a valid line Any string ' this is an invalid line I Need a regular expression that matches the first two line…
Jaskaran
  • 320
  • 5
  • 19
-2
votes
3 answers

Regex not meeting the requirement

I am trying to learn Regex construction, and i am stuck in one problem. Problem Statement: A regex that should match following phrases: rap them tapeth apth wrap/try sap tray 87ap9th apothecary but should not match aleht happy…
Gaurav Gupta
  • 4,586
  • 4
  • 39
  • 72
-2
votes
2 answers

Regular Expression for no. of 0s and 1s odd in the string

As we can develop regular expression like string ending with 01 over {0,1} Like (0+1)*01 So this way over {0,1} I want the answer of my question that what will be the regular expression for number of 0s and 1s both are odd Any combination of 1 and…
user4801793
-2
votes
1 answer

Regular expression: add string at start and end of line

I have text file and i want to add string for each row at the start and end of line. for example: BINARY_XML_RECORD_PREFIX_DICTIONARY_ATTRIBUTE_Z After regex action: case BINARY_XML_RECORD_PREFIX_DICTIONARY_ATTRIBUTE_Z: Thanks.
Itay Avraham
  • 125
  • 6
-2
votes
1 answer

Regular expression for IIS6

Using regex I want to convert a URL of the following form: https://example.com/employment/locationselect.aspx?pp=123default.aspx to a url that redirects to: https://example.com/employment/locationselect.aspx How do I construct the regex that will…
-2
votes
3 answers

Getting digits from string Regular Expression in Java

Hi don't know much of regular expression and I am trying to get just the digits from the string "glm=4563125@", can someone please help me. The number of digits can vary, so it's not specific the amount of digits that will be there. Thanks.
user3776276
  • 13
  • 1
  • 3
-2
votes
1 answer

Regular Language to DFA conversion

I ran into a problem in a textbook that I can't decipher and I was hoping you could help. I'm not asking for a solution, just a translation, or a push in the right direction. This is in the JFLAP textbook. The alphabet consists only of "a". {a^L |L…
ReezaCoriza
  • 133
  • 4
  • 16
-2
votes
2 answers

regular expression for more than three consecutive character

I have a requirement where I have to check String(combination of number and letter only) which Must not contain more than 3 successive characters(only character). for example: abcd - not allowed AbCd - not allowed abc3 - allowed abcr - allowed PQRS…
Mahendra Athneria
  • 1,203
  • 3
  • 16
  • 32
-2
votes
3 answers

Regular expression for English-only special characters

I need a regular expression to match a-zA-Z0-9 as well as whitespace and special characters, but only including English whitespace/special characters, not those of other languages like French or Spanish. Thanks.
MBehtemam
  • 7,865
  • 15
  • 66
  • 108
-3
votes
2 answers

How to extract a numeric value from a line of text with a regular expression?

I'm new to regular expressions, help me extract the necessary information from the text: salespackquantity=1&itemCode=3760041","quantity_box_sales_uom" &salespackquantity=1&itemCode=2313441","quantity_box I need to take the numbers 3760041 and…
user16993750
-3
votes
1 answer

How to write regular expression for 16 digit ID, rules mentioned below?

I have to write regular expression in informatica data quality for 16 digit ID which should follow below set of validation. ID Must have 16 characters and follow below rules- First 6 must be alphabetic characters Position 7 & 8 are numeric…
Yash
  • 39
  • 1
  • 7