Questions tagged [word-boundary]

A word boundary is the regular expression construct (\b) that denotes a word boundary which indicates a pointer position that is ahead of and behind a word character and a non-word character or the other way around (\w\W or \W\w), and vice-versa for non-word boundaries (\B).

A word boundary is the regular expression construct \b which allows asserting whether the current match pointer is in a word boundary.

It denotes a word boundary which indicates a pointer position that is ahead of and behind a word character and a non-word character or the other way around (\w\W or \W\w)

Non-word boundaries \B, on the other hand, denotes a pointer position which is both ahead of and behind of word characters, or non-word characters. (\W\W or \w\w)

175 questions
215
votes
13 answers

What is a word boundary in regex?

I'm trying to use regexes to match space-separated numbers. I can't find a precise definition of \b ("word boundary"). I had assumed that -12 would be an "integer word" (matched by \b\-?\d+\b) but it appears that this does not work. I'd be…
peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217
143
votes
7 answers

Regex match entire words only

I have a regex expression that I'm using to find all the words in a given block of content, case insensitive, that are contained in a glossary stored in a database. Here's my pattern: /($word)/i The problem is, if I use /(Foo)/i then words like…
Aaron
  • 1,617
  • 4
  • 13
  • 7
80
votes
3 answers

PostgreSQL Regex Word Boundaries?

Does PostgreSQL support \b? I'm trying \bAB\b but it doesn't match anything, whereas (\W|^)AB(\W|$) does. These 2 expressions are essentially the same, aren't they?
mpen
  • 272,448
  • 266
  • 850
  • 1,236
75
votes
2 answers

How to use grep()/gsub() to find exact match

string = c("apple", "apples", "applez") grep("apple", string) This would give me the index for all three elements in string. But I want an exact match on the word "apple" (i.e I just want grep() to return index 1).
Adrian
  • 9,229
  • 24
  • 74
  • 132
40
votes
2 answers

What are non-word boundary in regex (\B), compared to word-boundary?

What are non-word boundary in regex (\B), compared to word-boundary?
DarkLightA
  • 14,980
  • 18
  • 49
  • 57
39
votes
3 answers

Oracle REGEXP_LIKE and word boundaries

I am having a problem with matching word boundaries with REGEXP_LIKE. The following query returns a single row, as expected. select 1 from dual where regexp_like('DOES TEST WORK HERE','TEST'); But I want to match on word boundaries as well. So,…
Greg Reynolds
  • 9,736
  • 13
  • 49
  • 60
35
votes
7 answers

How to match the first word after an expression with regex?

For example, in this text: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc eu tellus vel nunc pretium lacinia. Proin sed lorem. Cras sed ipsum. Nunc a libero quis risus sollicitudin imperdiet. I want to match the word after 'ipsum'.
Matthew Taylor
  • 3,911
  • 4
  • 29
  • 33
23
votes
5 answers

utf-8 word boundary regex in javascript

In JavaScript: "ab abc cab ab ab".replace(/\bab\b/g, "AB"); correctly gives me: "AB abc cab AB AB" When I use utf-8 characters though: "αβ αβγ γαβ αβ αβ".replace(/\bαβ\b/g, "AB"); the word boundary operator doesn't seem to work: "αβ αβγ γαβ αβ…
cherouvim
  • 31,725
  • 15
  • 104
  • 153
22
votes
3 answers

Javascript - regex - word boundary (\b) issue

I have a difficulty using \b and greek characters in a regex. At this example [a-zA-ZΆΈ-ώἀ-ῼ]* succeeds to mark all the words I want (both greek and english). Now consider that I want to find words with 2 letters. For the English language I use…
tgogos
  • 23,218
  • 20
  • 96
  • 128
22
votes
4 answers

MySQL REGEXP word boundaries [[:<:]] [[:>:]] and double quotes

I'm trying to match some whole-word-expressions with the MySQL REGEXP function. There is a problem, when there are double quotes involved. The MySQL documentation says: "To use a literal instance of a special character in a regular expression,…
henk
  • 546
  • 1
  • 6
  • 16
15
votes
3 answers

A Viable Solution for Word Splitting Khmer?

I am working on a solution to split long lines of Khmer (the Cambodian language) into individual words (in UTF-8). Khmer does not use spaces between words. There are a few solutions out there, but they are far from adequate (here and here), and…
14
votes
4 answers

php regex word boundary matching in utf-8

I have the following php code in a utf-8 php file: var_dump(setlocale(LC_CTYPE, 'de_DE.utf8', 'German_Germany.utf-8', 'de_DE',…
tomsv
  • 7,207
  • 6
  • 55
  • 88
11
votes
4 answers

How can I find repeated words in a file using grep/egrep?

I need to find repeated words in a file using egrep (or grep -e) in unix (bash) I tried: egrep "(\<[a-zA-Z]+\>) \1" file.txt and egrep "(\b[a-zA-Z]+\b) \1" file.txt but for some reason these consider things to be repeats that aren't! for example,…
Mouse
  • 111
  • 1
  • 1
  • 5
10
votes
1 answer

Regular expression to match boundary between different Unicode scripts

Regular expression engines have a concept of "zero width" matches, some of which are useful for finding edges of words: \b - present in most engines to match any boundary between word and non-word characters \< and \> - present in Vim to match only…
hippietrail
  • 15,848
  • 18
  • 99
  • 158
9
votes
1 answer

Dollar Sign "\$" in Regular Expressions with word boundaries "\b" (PHP / JavaScript)

I am aware that the issue involving the dollar sign "$" in regex (here: either in PHP and JavaScript) has been discussed numerous times before: Yes, I know that I need to add a backslash "\" in front of it (depending on the string processing even…
GerZah
  • 91
  • 1
  • 3
1
2 3
11 12