Questions tagged [regex]

Regular expressions provide a declarative language to match patterns within strings. They are commonly used for string validation, parsing, and transformation. Specify the language (PHP, Python, etc) or tool (grep, VS Code, Google Analytics, etc) that you are using. Do not post questions asking for an explanation of what a symbol means or what a particular regular expression will match.

IMPORTANT NOTE: Requests to explain a regular expression pattern or construct will be closed as duplicates of the canonical post What does this regex mean which contains a lot of details on regular expression constructs. The post also contains links to many popular online regular expression testers (where the meanings of regex constructs can be found). One such tool is Regex101.


Regular expressions are a powerful formalism for pattern matching in strings. They are available in a variety of dialects (also known as flavors) in a number of programming languages and text-processing tools, as well as many specialized applications. The term "Regular expression" is typically abbreviated as "RegEx" or "regex".

Before asking a question here, please take the time to review the following brief guidelines.

How To Ask

  • Specify what tool or language you are using

    Regexes are everywhere. Different languages like Python, PHP and Java all use regexes, but with minor differences. Many different tools use regexes as well, from grep to most text editors to Google Analytics, also with their own differences. Specify the tool or language in your question. (Perhaps see also Why are there so many different regular expression dialects?)

  • Be clear about what you need.

    Keep in mind that regex dialects are different; the lowest common denominator will usually be quite different from what is possible and recommended for a tool with a modern, souped-up regex engine. (See previous section.)

    Also, are you looking for a regular expression for input validation (which needs to be rather strict), or do you need one for information extraction (which can be somewhat relaxed)?

    If your question relates to regular expressions in the strict computer science/automata theory sense, please state this explicitly.

    For most other questions, you should always include sample input, expected output, and an outline of what you have tried, and where you are stuck. Often, an example of what you do not want to match is also very helpful, and important to know.

  • Show us what you tried.

    A link to one of the many online regex testing tools (see link section) with your attempt and some representative data can do wonders.

    However, keep in mind, again, that there are many different regular expression dialects. (See earlier bullet points.) A result from an online tool for JavaScript or PHP does not necessarily work in Python or Java or sed or Awk or ... what have you.

    Even if you cannot post your problem online, showing us your best attempt helps us focus on what you need help with.

  • Search for duplicates.

    Before posting, check if your issue has already been solved by somebody else asking something similar. See also the following section.

Avoid Common Problems and Pitfalls

There are some common recurring beginner topics.

  • Do not assume that the tool you are using supports precisely the syntax of another tool.

    While modern Perl/Ruby/Python/PHP/Java regular expression support is widespread, you cannot assume that it is universal. In particular, many older tools (Awk, sed, grep, lex, etc.), as well as some newer ones (JavaScript, many text editors), use different dialects, some of which do not necessarily support e.g. non-capturing parentheses (?:...), non-greedy quantifiers *?, backreferences (\1, \2, etc), common character class abbreviations (\t, \d, POSIX character classes [[:class:]]), arbitrary repetition {m,n}, lookaheads (?=...), (?<=...), (?!...), etc. etc.

    If your question is not specific to any particular implementation, try the tag. This will generally imply a fairly minimal set of operators, corresponding to the ones specified in the common mathematical definition of regular languages.

  • Understand the difference between "glob" expressions and true regular expressions.

    Glob patterns are a less potent pattern matching language, which is commonly used for file name wildcards. In glob, * means "anything", while a lone * in a regular expression is, in fact, a syntax error in some dialects (though many engines will silently ignore it, rather than issue a warning; and others still will see it as a literal *).

    For the record, the regex way to say (as much as possible of) "anything" is .* where the "any single character (except newline, usually)" . metacharacter is repeated zero or more times (*). But see below about how "any character" and greediness is sometimes problematic.

    See also What are the differences between glob-style patterns and regular expressions?

  • Specifying a single repetition is unnecessary.

    Using {1} as a single-repetition quantifier is harmless but never useful. It is basically an indication of inexperience and/or confusion.

    h{1}t{1}t{1}p{1} matches the same string as the simpler expression http (or ht{2}p for that matter) but as you can see, the redundant {1} repetitions only make it harder to read.

  • Square brackets are commonly misunderstood or misused.

    Beginners often attempt to use square brackets for everything, including grouping. While [Jun][Jul] may look like a regex for matching months, it actually matches JJ, Ju, Jl, uJ, uu, ul, nJ, nu, or nl; not Jun or Jul. [Jun|Jul] is a wasteful way to write the functionally identical [|Junl]—it matches any one character from the set comprising |, J, u, l, and n.

    For the record, [abc] defines a character class which matches a single character which can be a or b or c. The proper way to express alternation is (Jun|Jul|Aug) in many dialects (though BRE and related dialects will need backslashes; \(Jun\|Jul\|Aug\) for traditional grep et al.) or, somewhat more parsimoniously, (Ju[nl]|Aug). The round parentheses (as opposed to the square brackets of character classes) perform grouping, and the | operator indicates matching alternatives.

    See also What is the difference between square brackets and parentheses in a regex?

  • Negation is tricky.

    Related to the previous, beginners will use negated character classes to attempt to restrict what can be matched. For example, to match turn but not turned, the following does not do what you want: turn[^ed] -- it will match turn followed by any single character which is not e or d (so it will not match turner, for example).

    In fact, the traditional regex does not allow for this to be expressed easily. With ERE, you could say turn($|[^e]|e$|e[^d]) to say that turn can be followed by nothing, or a character which is not e, or by e if it is not in turn followed by d. Modern regular expression dialects have an extension called lookarounds which allow you to say turn(?!ed)—but make sure your tool supports this syntax before plunging ahead.

    Notice also how the character class negation operator is distinct from the beginning of line anchor (^[abc] matches a, b, or c at beginning of the line, whereas [^abc] matches a single character which is not a, b, or c).

    See also the next bullet point.

  • If there is a way to match, the engine will find it.

    A common beginner's mistake is to supply useless optional leading or trailing elements. The trailing s? in dogs? does nothing to prevent a match on doggone or endogenous. If you want to prevent those, you will need to elaborate—perhaps something like dogs?\> (provided your dialect supports the final word boundary operator and provided that's what you mean).

    As it is, the regular expression dogs? will match exactly the same strings as just dog (though if your application captures the match, only the former will capture a trailing s if there is one).

  • Matches are greedy.

    The regex a.*b will match the entire string "abbbbbb" because * will always match as much as possible. Say a[^ab]*b if that's what you mean, or use non-greedy matching if your dialect supports it.

  • Watch what you capture

    If you use grouping parentheses, the parentheses define what is captured into a backreference. If you edit in parentheses for grouping purposes, make sure you are not renumbering your backreferences.

    Also, in particular, watch out for (abc){2,3} which only captures the last occurrence of abc in the matched string. If you want the repetition to be part of the capture, it needs to be inside the parentheses, like this: ((abc){2,3})

  • Don't use regex for everything!

    In particular, using (typically line-oriented) traditional regex tools to handle structured formats like HTML, XML, JSON, configuration files with block structure (Apache, nginx, many name servers, etc.) is likely to fail, or to produce incorrect results in numerous corner cases.

    Asking for HTML regexes tends to be met with negative reactions. The reasoning extends to all structured formats. If there is a parser for it, use that instead.

Further Reading

Learning regular expressions

Books

Documentation for JavaScript

Online sandboxes (for testing and publishing regexes online)

  • RegexPlanet (supports a variety of flavors to choose from)
  • Regexpal (ECMAScript flavor, as implemented by JavaScript)
  • Regexhero (.NET flavor)
  • RegexStorm.net (.NET flavor with link sharing capability)
  • RegExr v2.1 (in JavaScript)
  • RegExr v1.0 (ECMAScript flavor, as implemented by Adobe Flash)
  • Rubular (Ruby flavor)
  • myregexp.com (Java-applet with source code)
  • regexe.com (German; probably Java flavor)
  • regex101 (in ECMAScript (JavaScript), Python, PHP (PCRE 16-bit), Golang, Java, generates explanation of pattern)
  • regexper.com (generates graphical representation for ECMAScript flavor)
  • debuggex (generates graphical representation and shows processing of pattern – JavaScript, Python, and PCRE-compatible)
  • pyregex.com (Web validator for Python regular expressions)
  • regviz.org (Visual debugging of regular expressions for JavaScript)
  • Ultrapico Expresso (a standalone tool for testing .NET regular expressions)
  • Pythex (Quick way to test your Python regular expressions)

Online Regex generator (for building Regular Expressions via simplified input)

Other links

Regex Uses:

Regular expressions are useful in a wide variety of text processing tasks, and more generally string processing, where the data need not be textual. Common applications include data validation, data scraping (especially web scraping), data wrangling, simple parsing, the production of syntax highlighting systems, and many other tasks.

While regular expressions would be useful on Internet search engines, processing them across the entire database could consume excessive computer resources depending on the complexity and design of the regex. Although in many cases system administrators can run regex-based queries internally, most search engines do not offer regex support to the public. Notable exceptions: searchcode, or previously Google Code Search, which has been shut down in 2012.
Google also offers re2 (a C++ a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python): it does not backtrack and guarantees linear runtime growth with input size.

258926 questions
839
votes
15 answers

How can I match "anything up until this sequence of characters" in a regular expression?

Take this regular expression: /^[^abc]/. This will match any single character at the beginning of a string, except a, b, or c. If you add a * after it – /^[^abc]*/ – the regular expression will continue to add each subsequent character to the…
callum
  • 34,206
  • 35
  • 106
  • 163
830
votes
41 answers

How to count string occurrence in string?

How can I count the number of times a particular string occurs in another string. For example, this is what I am trying to do in Javascript: var temp = "This is a string."; alert(temp.count("is")); //should output '2'
TruMan1
  • 33,665
  • 59
  • 184
  • 335
830
votes
12 answers

How to negate specific word in regex?

I know that I can negate group of chars as in [^bar] but I need a regular expression where negation applies to the specific word - so in my example how do I negate an actual bar, and not "any chars in bar"?
Bostone
  • 36,858
  • 39
  • 167
  • 227
799
votes
9 answers

Regular expression to stop at first match

My regex pattern looks something like I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?…
publicRavi
  • 2,657
  • 8
  • 28
  • 34
796
votes
21 answers

Regular expression for alphanumeric and underscores

Is there a regular expression which checks if a string contains only upper and lowercase letters, numbers, and underscores?
Jim
789
votes
31 answers

Find and kill a process in one line using bash and regex

I often need to kill a process during programming. The way I do it now is: [~]$ ps aux | grep 'python csp_build.py' user 5124 1.0 0.3 214588 13852 pts/4 Sl+ 11:19 0:00 python csp_build.py user 5373 0.0 0.0 8096 960 pts/6 S+ …
Orjanp
  • 10,641
  • 12
  • 36
  • 39
779
votes
13 answers

What do 'lazy' and 'greedy' mean in the context of regular expressions?

What are these two terms in an understandable way?
ajsie
  • 77,632
  • 106
  • 276
  • 381
769
votes
27 answers

Regex to replace multiple spaces with a single space

Given a string like: "The dog has a long tail, and it is RED!" What kind of jQuery or JavaScript magic can be used to keep spaces to only one space max? Goal: "The dog has a long tail, and it is RED!"
AnApprentice
  • 108,152
  • 195
  • 629
  • 1,012
765
votes
13 answers

How do I remove all non alphanumeric characters from a string except dash?

How do I remove all non alphanumeric characters from a string except dash and space characters?
Luke101
  • 63,072
  • 85
  • 231
  • 359
746
votes
25 answers

How do I split a string with multiple separators in JavaScript?

How do I split a string with multiple separators in JavaScript? I'm trying to split on both commas and spaces, but AFAIK JavaScript's split() function only supports one separator.
sol
741
votes
9 answers

How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

How can I use regular expressions in Excel and take advantage of Excel's powerful grid-like setup for data manipulation? In-cell function to return a matched pattern or replaced value in a string. Sub to loop through a column of data and extract…
Automate This
  • 30,726
  • 11
  • 60
  • 82
721
votes
5 answers

What is a good regular expression to match a URL?

Currently I have an input box which will detect the URL and parse the data. So right now, I am using: var urlR = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+) (?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/; var url=…
bigbob
  • 7,219
  • 3
  • 15
  • 3
688
votes
17 answers

Regex Match all characters between two strings

Example: This is just\na simple sentence. I want to match every character between This is and sentence. Line breaks should be ignored. I can't figure out the correct syntax.
0xbadf00d
  • 17,405
  • 15
  • 67
  • 107
675
votes
10 answers

What is the difference between re.search and re.match?

What is the difference between the search() and match() functions in the Python re module? I've read the Python 2 documentation (Python 3 documentation), but I never seem to remember it.
Daryl Spitzer
  • 143,156
  • 76
  • 154
  • 173
674
votes
1 answer

Escape string for use in Javascript regex

Possible Duplicate: Is there a RegExp.escape function in Javascript? I am trying to build a javascript regex based on user input: function FindString(input) { var reg = new RegExp('' + input + ''); // [snip] perform search } But the…
too much php
  • 88,666
  • 34
  • 128
  • 138