Select all from the beginning of the line until a left square bracket

Question

I need a regex that will select everything from the beginning of the line until the first left square bracket. In the example below it would match Lorem, consectetur-adipisicing and labore et

Lorem [ipsum] dolor sit amet,

consectetur-adipisicing [elit] sed do 

eiusmod tempor incididunt ut

labore et [dolore] magna aliqua.

Thank you for the help.

Bernhard Barker · Answer 1 · 2013-04-24T10:11:36.910

2

Using look-behind and look-ahead:

(?<=^|\n)(.*?)(?=\s?\[)

Explanation:

(?<=...) is positive look-behind, checking that the previous characters match.

^|\n is intended to be start of line. Start of text (^) or new-line (\n).

. is any character.

.*? is zero or more of any characters. *? instead of * is non-greedy matching, so it will match up to the first rather than the last bracket.

(?=...) is positive look-ahead, checking that the next characters match.

\s is white-space, the ? makes it optional (this is to prevent the space before the [ from also matching).

\[ is an escaped [ (it needs to be escaped since [ has a different meaning)

edited Apr 24 '13 at 10:11

answered Apr 24 '13 at 09:49

Bernhard Barker

54,589
14
104
138

Great answer @Dukeling, very detailed. I'd add \s before \[ just to skip the white space. Btw, this matches **eiusmod tempor incididunt ut**, which is the text from the line which doesn't have the **[** char in it. – Rolando Isidoro Apr 24 '13 at 10:03
1

@RolandoIsidoro Added the `\s`. Are you sure it matches the line without the `[`? This should be prevented with the look-ahead (`(?=\s?\[)`). – Bernhard Barker Apr 24 '13 at 10:14
1

@RolandoIsidoro Untick "DOT ALL", then it works as required. Another option is to change the `.` to exclude new-lines (probably something like `[^\n]`). – Bernhard Barker Apr 24 '13 at 11:02

Jan Goyvaerts · Answer 2 · 2013-07-17T07:29:33.900

2

Why do people use the dot and complicated lookaround constructs when a simple anchor and negated character class will do the trick?

(?m)^[^\[\r\n]+(?=\[)

If your regex flavor supports it, you can further optimize this regex by making the quantifier possessive:

(?m)^[^\[\r\n]++(?=\[)

If your regex flavor doesn't support lookahead, include the [ in the match and use a capturing group to get the part that you want:

(?m)^([^\[\r\n]+)\[

If your regex flavor doesn't supoprt mode modifiers like (?m), simply turn on the option to make ^ match at line breaks ("multi-line mode") outside the regex.

edited Jul 17 '13 at 07:29

answered Jul 16 '13 at 13:59

Jan Goyvaerts

21,379
7
60
72

You are correct that there is no lookbehind assertion at the start needed, but the one at the end is needed to ensure that there is a `[` following the pattern (your regex matches "eiusmod tempor incididunt ut" wrongly). Also you need the the opening square bracket in your expression not the closing one. I think you meant `(?m)^[^\[\r\n]+(?=\[)`. [Regexr](http://regexr.com?35jtr) – stema Jul 17 '13 at 06:44
If lines without any `[` must not be matched, and the `[` must not be included in the match, then you need a lookahead. – Jan Goyvaerts Jul 17 '13 at 07:30

score 1 · Answer 3 · answered Apr 24 '13 at 09:44

1

try "[^\[]*" [] means a character set, ^\[ means anything except [ and * repeat any number of times. So combined, it should be your answer

answered Apr 24 '13 at 09:44

abasu

2,454
19
22

score 0 · Answer 4 · answered Apr 24 '13 at 09:41

0

I would say the most simple version would be:

(.*?)\[.*

answered Apr 24 '13 at 09:41

Salgar

7,687
1
25
39

Thanks Salgar but it matches the whole line, and not only the words preceding the bracket. When I remove .* and apply (.*?)\[ it does what I asked for, except it includes the bracket in the match, and it shouldn't – TotoKalvera Apr 24 '13 at 09:46
Ah sorry, I assumed you wanted a group match of the initial part. Go with what abasu said in that case. – Salgar Apr 24 '13 at 09:48
Dukeling's expression works like a charm, problem solved, thank you Salgar. – TotoKalvera Apr 24 '13 at 10:04

Suvasish Sarker · Answer 5 · 2013-04-24T10:34:36.083

0

This might be helpful..

^(.*)\[

Simple Example:

my $str ="consectetur-adipisicing [elit] sed do";
my $tmp;
if ($str =~ m/^(.*)\[/) {
    $tmp = $1;
}
print "String upto [: $tmp\n";

output is:

String upto [: consectetur-adipisicing

edited Apr 24 '13 at 10:34

answered Apr 24 '13 at 09:52

Suvasish Sarker

425
1
7
21

Thanks for the help! It does the thing, but the flaw is that it captures the bracket in the match. – TotoKalvera Apr 24 '13 at 10:05
@TotoKalvera, It will match upto `[` but you will get your required value in `$1` I've added an sample example in my answer. – Suvasish Sarker Apr 24 '13 at 10:31

Select all from the beginning of the line until a left square bracket

5 Answers5