6

I fetch some html and do some string manipulation and en up with a string like

string sample = "\n    \n   2 \n      \n  \ndl. \n \n    \n flour\n\n     \n 4   \n    \n cups of    \n\nsugar\n"

I would like to find all ingredient lines and remove whitespaces and linebreaks

2 dl. flour and 4 cups of sugar

My approach so far is to the following.

Pattern p = Pattern.compile("[\\d]+[\\s\\w\\.]+");
Matcher m = p.matcher(Result);

while(m.find()) {
  // This is where i need help to remove those pesky whitespaces
}
advantej
  • 20,155
  • 4
  • 34
  • 39
Flexo
  • 2,506
  • 3
  • 21
  • 32

6 Answers6

4

sample = sample.replaceAll("[\\n ]+", " ").trim();

Output:

2 dl. flour 4 cups of sugar

With no spaces in the beginning, and no spaces at the end.

It first replaces all spaces and newlines with a single space, and then trims of the extra space from the begging / end.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Kaj
  • 10,862
  • 2
  • 33
  • 27
3

Following code should work for you:

String sample = "\n    \n   2 \n      \n  \ndl. \n \n    \n flour\n\n     \n 4   \n    \n cups of    \n\nsugar\n";
Pattern p = Pattern.compile("(\\s+)");
Matcher m = p.matcher(sample);
sb = new StringBuffer();
while(m.find())
    m.appendReplacement(sb, " ");
m.appendTail(sb);
System.out.println("Final: [" + sb.toString().trim() + ']');

OUTPUT

Final: [2 dl. flour 4 cups of sugar]
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Your solution is just what im after, i will try it tomorrow. By the way, \n is included in \s, so you only need [\\s]+ in your pattern – Flexo May 26 '11 at 20:43
  • Why not use `replaceAll()` like everyone else did? – Alan Moore May 26 '11 at 22:16
  • Yes could have used `replaceAll()` as well, but OP was trying to do it using Pattern/Matcher Classes so wrote code using that. – anubhava May 26 '11 at 22:20
  • Actually, the reason i use the pattern/matcher is becuase the string contains other stuff as well, but thats the actual recipe. I just want format the ingredients so they can be presented in a nice list. – Flexo May 27 '11 at 07:02
1

I think something like this will work for you:

String test = "\n    \n   2 \n      \n  \ndl. \n \n    \n flour\n\n     \n 4   \n    \n cups of    \n\nsugar\n";

/* convert all sequences of whitespace into a single space, and trim the ends */
test = test.replaceAll("\\s+", " ");
mah
  • 39,056
  • 9
  • 76
  • 93
1

I assumed that the \n are not actual line feed, but it also works with linefeeds. This should work fine :

test=test.replaceAll ("(?:\\s|\\\n)+"," ");

In case there is no textual \n it can be simpler:

test=test.replaceAll ("\\s+"," ");

An you need to trim the leading/trailing spaces.

I use the RegexBuddy tool to check any single regex, very handy in so many languages.

millebii
  • 1,277
  • 2
  • 17
  • 27
  • To match the literal sequence `\n` (backslash + 'n'), you would need *four* backslashes in the regex (`\\\\n`), not three. But it's pretty clear the OP is really trying to match linefeeds. – Alan Moore May 26 '11 at 22:10
0

You should be able to use the standard String.replaceAll(String, String). The first parameter will take your pattern, the second will take an empty string.

Haphazard
  • 10,900
  • 6
  • 43
  • 55
  • Thats where i need the regex variables which i dont really know how to use. Let me examplify: my pattern matches "\n \n 2 \n \n \ndl. \n \n \n flour\n\n \n" and i would like to replace that with "2 dl. flour". my question here is how do i extract the information from the matched substring? – Flexo May 26 '11 at 19:24
  • @Flexo, see my reply, it does exactly that. – Kaj May 26 '11 at 19:47
0
s/^\s+//s
s/\s+$//s
s/(\s+)/ /s

Run those three substitutions (replacing leading whitespace with nothing, replace trailing whitespace with nothing, replace multiple whitespace with a space.

Seth Robertson
  • 30,608
  • 7
  • 64
  • 57