8

I'm searching for a library (Apache / BSD / EPL licensed) to convert native text to ASCII using \u for characters not available in ASCII (basically what java.util.Properties does).

I had a look and there don't seem to be any readily available libraries. I found:

Is anyone aware of a library under the above stated licenses?

Garrett Hyde
  • 5,409
  • 8
  • 49
  • 55
Sascha Vogt
  • 185
  • 1
  • 1
  • 5

2 Answers2

17

You can do this with an CharsetEncoder. You have to read the 'native' Text with the correct encoding to unicode. Than you can use an 'US-ASCII'-encoder to detect, which characters are to be translated into unicode escapes.

import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;

import org.junit.Test;

public class EncodeToEscapes {

@Test
public void testEncoding() {
    final String src = "Hallo äöü"; // this has to be read with the right encoding
    final CharsetEncoder asciiEncoder = Charset.forName("US-ASCII").newEncoder();
    final StringBuilder result = new StringBuilder();
    for (final Character character : src.toCharArray()) {
        if (asciiEncoder.canEncode(character)) {
            result.append(character);
        } else {
            result.append("\\u");
            result.append(Integer.toHexString(0x10000 | character).substring(1).toUpperCase());
        }
    }
    System.out.println(result);
 }
}

Additionally org.apache.commons:commons-lang contains StringEscapeUtils.escapeJava() which can escape and unescape native strings.

Sascha Vogt
  • 185
  • 1
  • 1
  • 5
Andreas
  • 1,183
  • 1
  • 11
  • 24
  • Thx, that's another approach. What I still can't believe that this hasn't already been done in any other available library. Additionally the other way round is also to be considered. – Sascha Vogt Apr 04 '12 at 11:32
  • 3
    You could use StringEscapeUtils from apache commons: System.out.println(StringEscapeUtils.escapeJava("Halloäöü")); There's also an corresponding unescapeJava. Can be found here: http://commons.apache.org/lang/ – Andreas Apr 04 '12 at 11:42
  • 1
    Thank you, Andreas. StringEscapeUtils did exactly what I was looking for. It seems to me that this would also be a good answer to the question posed. – Calon Nov 29 '13 at 10:17
6

Try this piece of code from Apache commons-lang:

StringEscapeUtils.escapeJava("ایران زیبای من");
StringEscapeUtils.unescapeJava("\u0627\u06CC\u0631\u0627\u0646 \u0632\u06CC\u0628\u0627\u06CC \u0645\u0646");
Tooraj Jam
  • 1,592
  • 14
  • 27