0

While reading and learning about regex, I have been trying to figure out why I go wrong in the current usage of my regex?

The string I have is

String sentence = "I would've rather stayed at home, than go to the Murphys' home, on the 'golden' weekend";

The current replaceAll argument I use is:

String[] tokens = sentence.replaceAll("[^\\sA-Za-z']+", "").split("\\s+");

This gives me an array of tokens that looks like

tokens = {"I", "__would've__", "rather", "stayed", "at", "home", "than", "go", "to", "the", "__Murphys'__", "home", "on", "the", "__'golden'__", "weekend"};

But I would like to remove the apostrophe from Murphys' to Murphys and 'golden' to golden while would've stays as would've.

Giving me an array that looks like

correctTokens = {"I", "__would've__", "rather", "stayed", "at", "home", "than", "go", "to", "the", "__Murphys__", "home", "on", "the", "__golden__", "weekend"};

Your help would be greatly appreciated

DarkSuniuM
  • 2,523
  • 2
  • 26
  • 43

2 Answers2

0

Use replaceAll("[^\\h\\v\\p{L}']+|(?<=\\P{L}|^)'|'(?=\\P{L}|$)", "")

Explanation:

  • [^\h\v\p{L}']+ One or more characters that are not:
    • Unicode (horizontal or vertical) whitespace
    • Unicode letter
    • Apostrophe '
  • | or
  • (?<=\P{L}|^)' A apostrophe preceded by a non-letter or beginning of input
  • | or
  • '(?=\P{L}|$) A apostrophe followed by a non-letter or end of input

See regex101.com for demo.

Andreas
  • 154,647
  • 11
  • 152
  • 247
0

try regex: \\s'|'\\s and replace with space

String sentence = "I would've rather stayed at home, than go to the Murphys' home, on the 'golden' weekend";

String[] tokens = sentence.replaceAll("\\s'|'\\s", " ").split("\s+");

output

[I, would've, rather, stayed, at, home,, than, go, to, the, Murphys, home,, on, the, golden, weekend]
The Scientific Method
  • 2,374
  • 2
  • 14
  • 25