4

I was using a regex for extracting data from curved brackets (or "parentheses") like extracting a,b from (a,b) as shown below. I have a file in which every line will be like

this is the range of values (a1,b1) and [b1|a1]
this is the range of values (a2,b2) and [b2|a2]
this is the range of values (a3,b3) and [b3|a3]

I'm using the following string to extract a1,b1, a2,b2, etc...

@numbers = $_ =~ /\((.*),(.*)\)/

However, if I want to extract the data from square brackets [], how can I do it? For example

this is the range of values (a1,b1) and [b1|a1]
this is the range of values (a1,b1) and [b2|a2]

I need to extract/match only the data in square brackets and not the curved brackets.

Guy Coder
  • 24,501
  • 8
  • 71
  • 136
Naidu
  • 139
  • 1
  • 4
  • 13

5 Answers5

27

[Update] In the meantime, I've written a blog post about the specific issue with .* I describe below: Why Using .* in Regular Expressions Is Almost Never What You Actually Want


If your identifiers a1, b1 etc. never contain commas or square brackets themselves, you should use a pattern along the lines of the following to avoid backtracking hell:

/\[([^,\]]+),([^,\]]+)\]/

Here's a working example on Regex101.

The issue with greedy quantifiers like .* is that you'll very likely consume too much in the beginning so that the regex engine has to do extensive backtracking. Even if you use non-greedy quantifiers, the engine will do more attempts to match than necessary because it'll only consume one character at a time and then try to advance the position in the pattern.

(You could even use atomic groups to make the matching even more performant.)

Marius Schulz
  • 15,976
  • 12
  • 63
  • 97
  • @marius i have tried using your pattern but i have a problem like i have the line as this is the range of values (a1,b1) [b1,a1] when using your pattern agai its extracting the data between the curved brackets but i need to extract or match for the data in square brackets – Naidu Jun 02 '14 at 10:29
  • @Naidu This shouldn't happen, actually, because I'm explicitly matching (escaped) square brackets: `\[`. Have you made sure you're embedding the pattern in your code correctly? – Marius Schulz Jun 02 '14 at 10:33
  • @Naidu I just added a link to a working example, see my updated post. – Marius Schulz Jun 02 '14 at 10:36
  • @marius yes i have correctly added the pattern even though its giving the same prob i have updated the question did you check how the line is because there will be both curved brackets and square brackets in that – Naidu Jun 02 '14 at 12:01
  • @Naidu I did check that and my pattern works perfectly fine for me. Did you check out the link to the example I posted? – Marius Schulz Jun 02 '14 at 12:02
2
#!/usr/bin/perl
# your code goes here
my @numbers;
while(chomp(my $line=<DATA>)){
    if($line =~ m|\[(.*),(.*)\]|){
    push @numbers, ($1,$2);
    }
}
print @numbers; 
__DATA__
this is the range of values [a1,b1]
this is the range of values [a2,b2]
this is the range of values [a3,b3]

Demo

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
  • 4
    This is not a great solution, I'm afraid, as it results in backtracking hell by using `.*`. Please refer to my answer and comments above. – Marius Schulz Jun 02 '14 at 12:01
1

You can match it using non-greedy quantifier *?

my @numbers = $_ =~ /\[(.*?),(.*?)\]/g;

or

my @numbers = /\[(.*?),(.*?)\]/g;

for short.

UPDATE

my @numbers = /\[(.*?)\|(.*?)\]/g;
mpapec
  • 50,217
  • 8
  • 67
  • 127
  • He's talking about *square* brackets. – Marius Schulz Jun 02 '14 at 09:16
  • hi mpapec i have not mentioned one thing in the early in my question that there is also curved brackets when am using your match patter its not recognising the Square brackets instead its matching the curved brackets and extracting data from curved brackets pleas can you help me with that ? – Naidu Jun 02 '14 at 11:10
  • in put will be this is the range of values (a1,b1) and [b1|a1] this is the range of values (a2,b2) and [b2|a2] this is the range of values (a3,b3) and [b3|a3] i need out put to be as b1 a1 b2 a2 b3 a3 that is only data from square brackets should be matched and extracted – Naidu Jun 02 '14 at 12:03
  • @mpapec i tried the update earlier but its not helping dono why – Naidu Jun 03 '14 at 06:32
0

Use the below code

$_ =~ /\[(.*?)\|(.*?)\]/g;

Now if the pattern is successfully matched, the extracted values would be stored in $1 and $2 .

shreyaskar
  • 375
  • 1
  • 3
  • 14
0

I know I am a little late here but none of the answers correctly answered OP's question and the one that does actually matches the entire thing along with the square brackets []. Clearly the OP wants to match what is inside the brackets.

  • To match everything inside square brackets along with the brackets. Example

    \[[^\[\]]*]

  • To match everything inside square brackets excluding the brackets themselves use a positive look-head and look-behind. Example

    (?<=\[)[^\[\]]*(?=\])

mihirjoshi
  • 12,161
  • 7
  • 47
  • 78