0

I'm trying for hours with regex: I need a regex to select all that is inside underlines. Example:

\_italic\_

But with the only condition that I need it to ignore \\_ (backslash followed by underscore).

So, this would be a match (all the text which is inside the \_):

\_italic some text 123 \\_*%&$ _

SO far I have this regex:

(\_.*?\_)(?!\\\_) 

But is not ignoring the \\_

Which regex would work?

June7
  • 19,874
  • 8
  • 24
  • 34
  • 1
    Would you ever have a string like `\_test_test_`? or `_hola_hola__test_` – JvdV Feb 03 '21 at 19:54
  • 1
    Really? It doesn't [look](https://regex101.com/r/tCQwzt/1) like it. Or can it *start* matching at an underscore even thought it's preceded by a forward slash? – JvdV Feb 03 '21 at 20:43
  • You are right, it has that problem, no it cannot match. – Developer01561315 Feb 03 '21 at 20:44
  • And what would you like to match when you have a string like `_hola_hola_test_` and `_hola_hola__test_`? – JvdV Feb 03 '21 at 20:52
  • In the first example hola_hola_test: first match: _hola_, second match: _test_ – Developer01561315 Feb 03 '21 at 20:55
  • In the second example (_hola_hola__test_): first match: _hola_, second match: __ – Developer01561315 Feb 03 '21 at 20:56
  • I don't quite get the results in the second example. Why should `__` be the 2nd match? – JvdV Feb 03 '21 at 20:57
  • 1
    Because this is already not in the current question scope. The current question is about matching from a `_`, then any chars other than `_` or `\_`, up to the first `_`. I added `(?<!\\)(?:\\{2})*` to the regex in the answer to make sure matching starts with an unescaped `_`. – Wiktor Stribiżew Feb 03 '21 at 21:00

1 Answers1

2

You can use

(?s)(?<!\\)(?:\\{2})*_((?:[^\\_]|\\.)+)_

See the regex demo. Details:

  • (?s) - an inline embedded flag option equal to Pattern.DOTALL
  • (?<!\\)(?:\\{2})* - a position that is not immediately preceded with a backslash and then zero or more sequences of double backslashes
  • _ - an underscore
  • ((?:[^\\_]|\\.)+) - Capturing group 1: one or more occurrences of any char other than a \ and _, or any escaped char (a combination of a \ and any one char)
  • _ - an underscore

See the Java demo:

List<String> strs = Arrays.asList("xxx _italic some text 123 \\_*%&$ _ xxx",
                                          "\\_test_test_");
String regex = "(?s)(?<!\\\\)(?:\\\\{2})*_((?:[^\\\\_]|\\\\.)+)_";
Pattern p = Pattern.compile(regex);
for (String str : strs) {
    Matcher m = p.matcher(str);
    List<String> result = new ArrayList<>();
    while(m.find()) {
        result.add(m.group(1));
    }
    System.out.println(str + " => " + String.join(", ", result));
}

Output:

xxx _italic some text 123 \_*%&$ _ xxx => italic some text 123 \_*%&$ 
\_test_test_ => test
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This works perfectly, I really appreciate your time! Thanks, I'll learn more about non capturing groups. – Developer01561315 Feb 03 '21 at 19:53
  • Btw I copied and pasted the regex from the demo, the Java regex seems like it has problems. I pasted the original and intelij automatically adds the \ needed. – Developer01561315 Feb 03 '21 at 20:27
  • Ups, one little problem as JvdV stated: this text: \\_test_test_ won´t work with this regex – Developer01561315 Feb 03 '21 at 20:46
  • 1
    @Developer01561315 Use `(?s)(?<!\\)(?:\\{2})*_((?:[^\\_]|\\.)+)_`. Copy/pasting strings to string literals is always prone to issues, make sure you know how to represent *literal strings* used on sheets of paper or online regex testers inside *string literals* used in code. – Wiktor Stribiżew Feb 03 '21 at 20:54
  • 1
    Thanks, it seems it solves the problem, Ill try more cases, thank you!!! – Developer01561315 Feb 03 '21 at 20:59