-1

If a string contains any non-ASCII value, i need to to delete it.

I have tried replaceAll mrthod which is working fine with jdk 1.8 and above. but i want to deploy the same on jdk 1.6 and is it not functioning.

String Remarks2 ="hii:╘’i";

String Remarks = Remarks2.replaceAll("[^\\p{ASCII}]", "");
System.out.println("ans: "+Remarks);

Output in jdk 1.8: hii:i Output in jdk 1.6: hii:╘’i

Actual result must be- hii:i

kometen
  • 6,536
  • 6
  • 41
  • 51
Shreyas
  • 11
  • 3
  • Just a note: "h" is an ASCII character. – T.J. Crowder Mar 28 '19 at 09:18
  • Do you mean remove any non-ascii value? – kometen Mar 28 '19 at 09:50
  • Have you tried to use "[^\\x00-\\x7F]" instead of "[^\\p{ASCII}]". Maybe "\p[ASCII}" is not defined in 1.6 – SirFartALot Mar 28 '19 at 09:59
  • `\p{ASCII}` is defined in Java 6, see https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html – Mincong Huang Mar 28 '19 at 11:29
  • @T.J.Crowder Thanks. but what i have understand is that when you press and hold down ALT while typing the character code we can generate the ascii key. For example, to insert the degree (º) symbol, press and hold down ALT while typing 0176 on the numeric keypad. Also people sometime copy paste the content from excel sheet resulting the change in schema when we try to store the same through input field from the html text box. – Shreyas Mar 29 '19 at 06:44
  • @kometen yes!! need to replace it with blank character i.e "" – Shreyas Mar 29 '19 at 06:45
  • @SirFartALot yes i've tried it, still no desired output. – Shreyas Mar 29 '19 at 06:49
  • I tried it with jdk1.6.0_24 and the code you provided DID output "hii:i". Have you really tested this? On which version of JDK1.6 did you test it? – SirFartALot Mar 29 '19 at 08:28
  • @SirFartALot once again i checked, it is not working. we have 1.6.0_45 installed. Please suggest – Shreyas Mar 30 '19 at 09:11
  • Okay there's one update, ascii is working absolutely fine. but when a string contains single quote as ’ rather than ' it is not working. ’ vs ' (copy paste in notepad to see the difference). what kind of characters are they?? how to identify such characters. @SirFartALot – Shreyas Mar 30 '19 at 09:44

1 Answers1

0

As commented \p{ASCII} came later. As ASCII is the first 7-bits range:

String remarks = remarks2.replaceAll("[^\u0000-\u007F]", "");

You might try to normalize text decomposing é into e and a zero-width combining mark ´ first:

remarks2 = Normalizer.normalize(remarks2, Form.NFD);
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138