3

I want to match all spaces that are inside every [[]] in a string so I could use a replaceAll method and remove them.

Example input: text text [[ ia asd ]] [[asdasd]] dfgd dfaf sddgsd [[sss aaa]]

Expected output: text text [[iaasd]] [[asdasd]] dfgd dfaf sddgsd [[sssaaa]]

I thought of this: \[\[(\s*?)\]\] which should match all spaces that are between double brackets but it doesn't match anything.

I also tried several other solutions to similar problems but non seemed to work.

Any clue what else could be used?

shinzou
  • 5,850
  • 10
  • 60
  • 124
  • You might want to use a lookahead and lookbehind operator. This is what I have, although I'm not sure where to go from here: (?<=\\[\\[) *(?=\\]\\]) – veryverde Nov 16 '22 at 19:07
  • I've got sort of a crummy answer. If you are just looking for quick and dirty one time thing: `\[\[\s*(\w*)\s*(\w*)\s*(\w*)]]` replace with `[[$1$2$3]]` it just doesn't scale. But you can copy paste more `\s*(\w*)` in there with more $1 values. https://regex101.com/r/vLwYkU/1 if this works I can write up in nicely as an answer :) – sniperd Nov 16 '22 at 19:21
  • This is best done with regex replace with a callable in the replacement part, e.g. `text.replace(/\[\[.*?]]/g, (x) => x.replace(/\s+/g, ''))` in JavaScript. There are ways to remove whitespace with a plain text replacement if you use .NET or PCRE. Is it Java? – Wiktor Stribiżew Nov 16 '22 at 19:40
  • Without checking opening `[[` a quick one would be to use a lookahead: [`\s+(?=[^\]\[]*]])`](https://regex101.com/r/75sNEp/1) – bobble bubble Nov 16 '22 at 21:12
  • @bobblebubble but that could cause some serious bugs if a `]]` will be in the string, then all previous spaces will be removed. – shinzou Nov 23 '22 at 11:59
  • Due to use of the negated class it can not "skip backwards" over any `[`. The only "issue" I can think of if `[[` would be missing (but wouldn't that be correct anyway?). If you like, update [this demo](https://regex101.com/r/BErIVw/1) with cases that fail. – bobble bubble Nov 23 '22 at 12:14
  • @shinzou Though to mention that it's not working if any single `[` or `]` inside `[[`...`]]` (which the selected answer can deal with). It's a good answer anyway. :) However for any other strings without brackets inside it could be a short and efficient alternative. – bobble bubble Nov 23 '22 at 12:47

1 Answers1

2

Considering it is Java, you can use

String result = text.replaceAll("(\\G(?!^)|\\[\\[)((?:(?!]]).)*?)\\s+(?=.*?]])", "$1$2")

Or, another approach is matching all substrings between [[ and ]] and then removing any whitespace inside the matches:

String text = "text text [[ ia asd ]] [[asdasd]] dfgd dfaf sddgsd [[sss aaa]]";
Pattern p = Pattern.compile("\\[\\[.*?]]");
Matcher m = p.matcher(text);
StringBuffer buffer = new StringBuffer();
while(m.find()) {
    m.appendReplacement(buffer, m.group().replaceAll("\\s+", ""));
}
m.appendTail(buffer);
System.out.println(buffer.toString());

See the Java demo online.

The first regex means:

  • (\G(?!^)|\[\[) - Group 1 ($1): either [[ or the end of the preceding successful match
  • ((?:(?!]]).)*?) - Group 2 ($2): any char other than line break chars, zero or more but as few as possible occurrences, that does not start a ]] char sequence
  • \s+ - one or more whitespaces
  • (?=.*?]]) - immediately to the right, there must be any zero or more chars other than line break chars, as few as possible, and then ]].
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563