2

I would like to match rows of a table. There are no symbols which signal where one cell starts or ends but whitespace. Strings of characters with < 3 whitespace in between should be seen as cells.

An example row:

"           here is a $$ cell               here  another         cells I dont care about........."

This is my naive and invalid attempt in which I just want 2 cells:

\\s{5,}([^\\s{2,}]+)\\s{5,}([^\\s{2,}]+)\\s{5,}.*
TomTom
  • 2,820
  • 4
  • 28
  • 46

2 Answers2

2

You may trim the input first, and then split with 3 or more whitespaces, then check if you got the first 2 cell values and use them :

String s = "           here is a $$ cell               here  another         cells I dont care about.........";
String[] res = s.trim().split("\\s{3,}");
if (res.length > 1) {
    System.out.println(res[0]); // Item 1
    System.out.println(res[1]); // Item 2, the rest is unimportant
}

See the Java demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

This regex should do the trick hopefully:

 (?<=\s{3,}|^\s?\s?)\w[\w\W]*?(?=\s{3,}|\s?\s?$)

With Java formatting it is:

"(?<=\\s{3,}|^\\s?\\s?)\\w[\\w\\W]*?(?=\\s{3,}|\\s?\\s?$)"

What it does is it tries to match a group of characters as short as possible that starts with a non white space character (so it doesn't match a single space).Then it checks behind if there are at least 3 white spaces or the line start and then checks if there are at least 3 white spaces or the line end after the match.

It only matches one cell so just repeat the expression to match multiple cells.

You can mess around with this here: http://fiddle.re/0tmcza

EDD
  • 2,070
  • 1
  • 10
  • 23