Get the second match by regex

Question

I want to get the second occurrence of the matching pattern (inside the brackets) by using a regex. Here is the text

[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN

I want to extract 2 from this text.I tried using

(?<Ten ID>((^)*((?<=\[).+?(?=\]))))

But it matches 2019-07-29 09:48:11,928 , 2 , AM. How to get only 2 ?

Here is a [bunch of solutions](https://stackoverflow.com/a/57671253/3832970) with an improved @Thefourthbird's suggestion, too. — Wiktor Stribiżew, Aug 27 '19 at 09:19
@Wiktor I am using this in fluentd to separate logs. I can only use just the regex expression. Can you give me a solution? — Charith_32, Aug 27 '19 at 09:45
@Wiktor this is what I needed, Can you please explain this a bit! — Charith_32, Aug 27 '19 at 10:04

Wiktor Stribiżew · Accepted Answer · 2019-08-27T10:05:31.427

To get a substring between [ and ] (square brackets) excluding the brackets you may use /\[([^\]\[]*)\]/ regex:

\[ - a [ char
([^\]\[]*) - Capturing group 1: any 0+ chars other than [ and ]
\] - a ] char.

To get the second match, you may use

str = '[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN'
p str[/\[[^\]\[]*\].*?\[([^\]\[]*)\]/m, 1]

See this Ruby demo. Here,

\[[^\]\[]*\] - finds the first [...] substring
.*? - matches any 0+ chars as few as possible
\[([^\]\[]*)\] - finds the second [...] substring and captures the inner contents, returned with the help of the second argument, 1.

To get Nth match, you may also consider using

str = '[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN'
result = ''
cnt = 0
str.scan(/\[([^\]\[]*)\]/) { |match| result = match[0]; cnt +=1; break if cnt >= 2}
puts result #=> 2

See the Ruby demo

Note that if there are fewer matches than you expect, this solution will return the last matched substring.

Another solution that is not generic and only suits this concrete case: extract the first occurrence of an int number inside square brackets:

s = "[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN"
puts s[/\[(\d+)\]/, 1] # => 2

See the Ruby demo.

To use the regex in Fluentd, use

\[(?<val>\d+)\]

and the value you need is in the val named group. \[ matches [, (?<val>\d+) is a named capturing group matching 1+ digits and ] matches a ].

Fluentular shows:

Copy and paste to fluent.conf or td-agent.conf

     
      type tail 
      path /var/log/foo/bar.log 
      pos_file /var/log/td-agent/foo-bar.log.pos 
      tag foo.bar 
      format /\[(?\d+)\]/

Records

 Key    Value
 val    2

score 0 · Answer 2 · answered Aug 27 '19 at 08:51

0

From extract string between square brackets at second occurrence

/\[[^\]]*\][^[]*\[([^\]]*)\]/

You can use this, and need the second capture group.

answered Aug 27 '19 at 08:51

Mark

6,112
4
21
46

Is there a way that I can get **2** as the full match. I want this to be used in fluentd, It only takes the full match! – Charith_32 Aug 27 '19 at 09:54

score 0 · Answer 3 · answered Aug 27 '19 at 08:52

0

If you know that it's always the second match, you can use scan and take the second result:

"[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN".scan(/\[([^\]]*)\]/)[1].first
# => "2"

answered Aug 27 '19 at 08:52

mrzasa

22,895
11
56
94

score 0 · Answer 4 · answered Oct 20 '19 at 07:24

def nth_match(str, n)
  str[/(?:[^\[]*\[){#{n}}([^\]]*)\]/, 1]
end

str = "Little [Miss] Muffet [sat] on a [tuffet] eating [pie]."

nth_match(str, 1)  #=> "Miss" 
nth_match(str, 2)  #=> "sat" 
nth_match(str, 3)  #=> "tuffet" 
nth_match(str, 4)  #=> "pie" 
nth_match(str, 5)  #=> nil

We could write the regular expression in free-spacing mode to document it.

/
(?:       # begin a non-capture group
  [^\[]*  # match zero or more characters other than '['
  \[      # match '['
){#{n}}   # end non-capture group and execute it n times
(         # start capture group 1,
  [^\]]*  # match zero or more characters other than ']' 
)         # end capture group 1
\]        # match ']'
/x        # free-spacing regex definition mode

/(?:[^\[]*\[){#{n}}([^\]]*)\]/

Get the second match by regex

4 Answers4