1

My application is reading a file which contains following data:

MENS HEALTH^\^@ P

while actual text should be

MENS HEALTH P

I have already replaced the '\u0000' but still "^\" is still remaining in the string. I am not sure what is code for this characters, so I can replace it.

When I open the file in intelliJ editor it's displayed as FS symbol.

Please suggest how I can eliminate this.

Thanks,

Manish
  • 1,274
  • 3
  • 22
  • 59
  • It's displayed as FS because that's what it is - the ASCII "file separator". The coding is 001C. – dangling else Jun 01 '22 at 12:04
  • What is the character-encoding? And what are the actual bytes? It's hard to tell what the actual data is :) – Rob Audenaerde Jun 01 '22 at 12:20
  • FS = `^\` = `\u001C` and NUL = `^@` = `\u0000`. `s = s.replaceAll("[\u0000\u001C ]", "");` – Joop Eggen Jun 01 '22 at 12:32
  • This should take care of business for any language characters: `str = str.replaceAll("[^ \\p{L}]", "");`. Try it against a String like: `String str = "МУЖСКОЕ ЗДОРОВЬЕ^\\^@ П";` which is `MENS HEALTH^\^@ P` but in Russian. – DevilsHnd - 退職した Jun 01 '22 at 14:15

2 Answers2

2

Rather than worry about what characters the junk consists of, remove everything that isn't what you want to keep:

str = str.replaceAll("[^\\w ]+", "");

This deletes any characters that are not word characters or spaces.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

You can use a regular expression with String.replaceAll() to replace these characters.

Note that backslash has a special meaning and need to be escaped (with a backslash).

"my\\^@String".replaceAll("[\\\\^@]", "");

Online Demo

Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46