3

I am trying to write a regular expression for Input text, where i have to extract all WARN code with the message ahead. In general the WARN may or may not be multiline as shown below.

[C] L1250 WARN  k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>
[C] L1250 WARN  For abcd (analytical and transactional workloads). For 12s Systems and above, should be
                disabled.
[C] L1250 INFO  For abcd (analytical workloads), Hyperthreading should be enabled , 8s, 12s, 14d, 34t
                d above.
[C] L1250 WARN  Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently
                fix it!
[C] L1300 OK    CPU governors set as recommended
[C] L1250 WARN  Intel's Hyperthreading on 8+ Socket system disabled.

Initially, i started with regex: (WARN).*(\b|\B), this captures till end of word/non word boundary, which does not capture following multiline(continuing WARN description).

Then i tried-> WARN.+([\S\s]*?)+(?=\[C\]) but this does not capture last WARN line, as there is no further [C] marker.

enter image description here

2 Answers2

2

You can get your matches without using [\s\S]* or the single line option by matching all lines that do not start with [C]

\bWARN\h+.*(?:\R(?!\[C]).*)*

Explanation

  • \bWARN Match WARN preceded by a word boundary to prevent being part of a larger word
  • \h+.* Match 1+ horizontal whitespace chars
  • (?: Non capture group
    • \R(?!\[C]).* Match unicode newline sequence, assert that the string does not start with [C]
  • )* Close group and repeat 0+ times

Regex demo | Java demo

For example:

String regex = "\\bWARN\\h+.*(?:\\R(?!\\[C]).*)*";
String string = "[C] L1250 WARN  k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>\n"
     + "[C] L1250 WARN  For abcd (analytical and transactional workloads). For 12s Systems and above, should be\n"
     + "                disabled.\n"
     + "[C] L1250 INFO  For abcd (analytical workloads), Hyperthreading should be enabled , 8s, 12s, 14d, 34t\n"
     + "                d above.\n"
     + "[C] L1250 WARN  Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently\n"
     + "                fix it!\n"
     + "[C] L1300 OK    CPU governors set as recommended\n"
     + "[C] L1250 WARN  Intel's Hyperthreading on 8+ Socket system disabled.";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group(0));
}

Output

WARN  k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>
WARN  For abcd (analytical and transactional workloads). For 12s Systems and above, should be
                disabled.
WARN  Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently
                fix it!
WARN  Intel's Hyperthreading on 8+ Socket system disabled.

If the [C] is not a boundary, another option is the check if the next line does not contain one of WARN, INFO or OK

 \bWARN\h+.*(?:\R(?!.*\h(?:WARN|INFO|OK)\h).*)*

Regex demo

In Java

String regex = "\\bWARN\\h+.*(?:\\R(?!.*\\h(?:WARN|INFO|OK)\\h).*)*";
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1

Try this regex with option global and single line: WARN.*?(?=\[C\]|$)

This will find everything starting with WARN until the next '[C]' or the end of the input string.

Demo: https://regex101.com/r/KZXWwL/1

Ogod
  • 878
  • 1
  • 7
  • 15
  • An extra blank was also getting appended after using the suggested regex with Pattern.DotAll, which i later trimmed. It helped ... thanks :) – Sushant Shukla Apr 25 '20 at 04:13