5

I had a question about the control characters. I have to found them in a string and delete them. Made some research and found useful tips.

I wrote this:

output.toString().replaceAll("[\\p{Cntrl}\\p{Cc}]","")

But I was asked if this method can find the control characters if they are written in bytes. To be honest, I have no idea. Try to look on the net, but don't know how I can test it.

Thanks

Zoyd
  • 3,449
  • 1
  • 18
  • 27
Tony
  • 85
  • 1
  • 7
  • how much space is each character occupying in bytes? 2bytes or one byte? You can try comparing the integer value of the byte with the ascii value of the control keys. – OnePunchMan May 14 '14 at 11:32
  • What does the Cc do? I see {Ctrl} here: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html, but nothing about Cc. Thanks. – user420667 Jan 01 '17 at 21:56

2 Answers2

3

Yes, the characters will be removed, see next code:

byte[] chars = { 'h', 'e', 10, 15, 21, 'l', 'l', 'o', 13 };
String str = new String(chars, "utf8");
System.out.println("==========");
System.out.println(str);
System.out.println("==========");
System.out.println(str.replaceAll("[\\p{Cntrl}\\p{Cc}]", ""));
System.out.println("==========");

The output for that code would be:

 ==========
 he
 llo
 ==========
 hello
 ==========

Once the special character is included in an String object it doesn't matter if was created from a byte[] or whatever else object, It's stored always in the same format.

Roberto
  • 8,586
  • 3
  • 42
  • 53
0

If by "written in bytes" you mean that your input is a byte array, you can write

String s = new String(myByteArray)

and use your code on s.

Zoyd
  • 3,449
  • 1
  • 18
  • 27
  • No my input is a String, but he said me this : "but will it work with bytes? cause control chars are represented that way." But the first input that I receive is a String. So for me, my method has to work. – Tony May 14 '14 at 11:34
  • Then I don't understand the question. What bytes ? – Zoyd May 14 '14 at 11:34
  • That's my problem too -_-'... I asked him what he wanted, but had no more explanation. He said, you has to convert your string into byte ( myString.getBytes(Charset.forName("UTF-8")) ) then find the control characters. And the convert into string again. For me seems useless...because if we have a String, just use my regex. – Tony May 14 '14 at 11:38