I need to get a text splitted with regex in Java (each substring will be less than or close to 10 characters (including space and special) and no word would be splitted). For example, "James has gone out for a meal." would be "James has", "gone out", "for a meal", ".". Thanks in advance.
Asked
Active
Viewed 448 times
-1
-
4Can you share what you have tried? – Manoj Vadehra May 29 '19 at 03:51
-
I've tried Splitter in Guava like Splitter.on(regexp).trimResults().split(text). For regexp I've used something like "(\W|^)[\w.]{0,10} (\W|$)". I guess it is logically incorrect. – user11372017 May 29 '19 at 03:59
-
You can derive some inspiration from: https://stackoverflow.com/q/4398270/9192223 – hiren May 29 '19 at 04:00
-
You have to determine an end for your parts of substrings. If we determine . and space as separators, then you can use this pattern. (.{0,10})(?:\s|\.) https://regex101.com/r/I1nrb6/1 – Hamed Ghasempour May 29 '19 at 04:02
-
@hiren thanks, a lot of possible ways to solve the problem. But I'd like to use regex . – user11372017 May 29 '19 at 04:03
2 Answers
1
This expression might be a little complicated, maybe we could start with:
.{1,10}[^\s](?=\s|$)
DEMO
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = ".{1,10}[^\\s](?=\\s|$)";
final String string = "James has gone out for a meal.";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
RegEx Circuit
jex.im visualizes regular expressions:

Emma
- 27,428
- 11
- 44
- 69
1
First, remove all double spaces if exists and apply this regex.
.{1,11}(?:\s|$)|.{1,11}(?:[^\s]|$)
But I would use the split function and then 'for clause' calculating lengths.

Kang Andrew
- 330
- 2
- 14