5

I know how to replace ALL non alphanumeric chars in a string but how to do it from just beginning and end of the string?

I need this string:

"theString,"

to be:

theString

replace ALL non alphanumeric chars in a string:

s = s.replaceAll("[^a-zA-Z0-9\\s]", "");
Mike6679
  • 5,547
  • 19
  • 63
  • 108

5 Answers5

14

Use ^ (matches at the beginning of the string) and $ (matches at the end) anchors:

s = s.replaceAll("^[^a-zA-Z0-9\\s]+|[^a-zA-Z0-9\\s]+$", "");
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • What's that `\\s` doing in there? I know OP had it, but it was wrong then and it's wrong now. – David Conrad Jul 26 '14 at 03:17
  • @DavidConrad, `\\s` will match any whitespace character. I thought it was OP's intention to exclude alpha-numeric characters and space characters, so I didn't touch it. – falsetru Jul 26 '14 at 03:20
  • 1
    Exactly, that's why it's wrong. OP said "replace ALL non alphanumeric chars in a string". It's a negated set, so it will replace anything EXCEPT a-z, A-Z, 0-9, and any whitespace character. So it will leave in whitespace. – David Conrad Jul 26 '14 at 03:23
  • I think OP was trying to match UP TO a space, and didn't get how sets work. I guess I could be wrong. – David Conrad Jul 26 '14 at 03:24
  • @falsetru does this strip ALL non alphanumeric from beginning and end of the string or just one in beginning and end? – Mike6679 Jul 26 '14 at 03:32
  • @Mike, It removes **all** non alphanumeric + non whitespace from the beginning and the end of the string. (I used `+`). If you want to remove only `one`, remove `+`. – falsetru Jul 26 '14 at 03:33
  • Just a note: Although this did work, I had to replace with my own parser because the regex expression was just too expensive over thousands of iterations. – Mike6679 Jul 26 '14 at 05:46
1

Use:

s.replaceAll("^[^\\p{L}^\\p{N}\\s%]+|[^\\p{L}^\\p{N}\\s%]+$", "")

Instead of:

s.replaceAll("^[^a-zA-Z0-9\\s]+|[^a-zA-Z0-9\\s]+$", "")

Where p{L} is any kind of letter from any language.
And p{N}is any kind of numeric character in any script.
For use in Latin-based scripts, when non-English languages are needed, like Spanish, for instance: éstas, apuntó; will in the latter become; stas and apunt. The former also works on non-Latin based languages.
For all Indo-European Languages, add p{Mn} for Arabic and Hebrew vowels:

s.replaceAll("^[^\\p{L}^\\p{N}^\\p{Mn}\\s%]+|[^\\p{L}^\\p{N}^\\p{Mn}\\s%]+$", "")

For Dravidian languages, the vowels may surround the consonant - as opposed to Semitic languages where they are "within" the character - like ಾ. For this use p{Me} instead. For all languages use:

s.replaceAll("^[^\\p{L}^\\p{N}^\\p{M}\\s%]+|[^\\p{L}^\\p{N}^\\p{M}\\s%]+$", "")

See regex tutorial for a list of Unicode categories

Danielson
  • 2,605
  • 2
  • 28
  • 51
-1

This removes all the non-alphanumeric characters

s = s.replaceAll("[^a-zA-Z0-9]", "");
S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
  • 1
    This removes all the non-alphanumeric characters – O. Jones Dec 23 '14 at 01:36
  • 2
    Code-only answers are considered low quality: make sure to provide an explanation what your code does and how it solves the problem. It will help the asker and future readers both if you can add more information in your post. See also Explaining entirely code-based answers: https://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers – borchvm Oct 28 '19 at 09:21
-1
yourString=yourString.replaceAll("^\\W+|\\W+$","");
Prasad Bhosale
  • 682
  • 8
  • 20
-1

Guava's CharMatcher provides a concise solution:

CharMatcher.javaLetterOrDigit().negate().trimFrom(input);
Bunarro
  • 1,550
  • 1
  • 13
  • 8