3

I got to port some chat code from iOS to Android. Before sending the chat message to the socket, the iOS code uses the NSNonLossyASCIIStringEncoding class as parameter of the NSString::dataUsingEncoding.

How would you do it in Android? Same question about the opposite decoding.

Without doing that, for instance, the line breaks disappear in the message received on the other mobile.

Code on iOS:

NSData *data1 = [myStringTosend dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *goodValue = [[[NSString alloc] initWithData:data1 encoding:NSUTF8StringEncoding] autorelease];

And decoding:

NSData *data = [[NSData alloc] initWithData:[response dataUsingEncoding:NSASCIIStringEncoding]];

So far (and not correct), encoding on the Android side:

OutputStream os = socket.getOutputStream();
os.write(request.getBytes("UTF-8"));
os.flush();

And decoding:

while ((bytesRead = is.read(buffer, 0, BUFFER_SIZE)) >= 0) {
    if (bytesRead > 0) response.append(new String(buffer, 0, bytesRead, "UTF-8"));
    if (bytesRead < BUFFER_SIZE) break;
}
Vincent Mimoun-Prat
  • 28,208
  • 16
  • 81
  • 124

3 Answers3

9

@portforwardpodcast is absolutely correct that you should, if possible, avoid ASCII encoding your utf8 and instead set up your stack to handle/store utf8 directly. That said, if you don't have the ability to change the behavior, the following code may be helpful.

While there's no published explanation of how NSNonLossyASCIIStringEncoding works, based on its output it looks like:

  • Bytes in the extended ASCII range (decimal values 128 - 255) are escaped using an octal encoding (e.g. ñ with decimal value 241 -> \361)
  • Non-ASCII code points are escaped in two byte chunks using a hex encoding (e.g. which takes up 32 bits with decimal value 128549 -> \ud83d\ude25)

So to encode:

public static String encodeToNonLossyAscii(String original) {
    Charset asciiCharset = Charset.forName("US-ASCII");
    if (asciiCharset.newEncoder().canEncode(original)) {
        return original;
    }
    StringBuffer stringBuffer = new StringBuffer();
    for (int i = 0; i < original.length(); i++) {
        char c = original.charAt(i);
        if (c < 128) {
            stringBuffer.append(c);
        } else if (c < 256) {
            String octal = Integer.toOctalString(c);
            stringBuffer.append("\\");
            stringBuffer.append(octal);
        } else {
            String hex = Integer.toHexString(c);
            stringBuffer.append("\\u");
            stringBuffer.append(hex);
        }
    }
    return stringBuffer.toString();
}

And to decode (this can be made more efficient by parsing the two types of encodings in lock step, rather as two separate passes):

private static final Pattern UNICODE_HEX_PATTERN = Pattern.compile("\\\\u([0-9A-Fa-f]{4})");
private static final Pattern UNICODE_OCT_PATTERN = Pattern.compile("\\\\([0-7]{3})");

public static String decodeFromNonLossyAscii(String original) {
    Matcher matcher = UNICODE_HEX_PATTERN.matcher(original);
    StringBuffer charBuffer = new StringBuffer(original.length());
    while (matcher.find()) {
        String match = matcher.group(1);
        char unicodeChar = (char) Integer.parseInt(match, 16);
        matcher.appendReplacement(charBuffer, Character.toString(unicodeChar));
    }
    matcher.appendTail(charBuffer);
    String parsedUnicode = charBuffer.toString();

    matcher = UNICODE_OCT_PATTERN.matcher(parsedUnicode);
    charBuffer = new StringBuffer(parsedUnicode.length());
    while (matcher.find()) {
        String match = matcher.group(1);
        char unicodeChar = (char) Integer.parseInt(match, 8);
        matcher.appendReplacement(charBuffer, Character.toString(unicodeChar));
    }
    matcher.appendTail(charBuffer);
    return charBuffer.toString();
}
akdotcom
  • 4,627
  • 2
  • 17
  • 16
2

Don't use NSNonLossyASCIIStringEncoding, use utf-8 encoding. I just solved this problem myself on ios+android+java spring backend, and it took me around 4 full days to figure everything out. Android can't display emojis, but this gives me full character support in almost all (or all not sure) languages. Here are the articles that helped me:

Must Read: http://blog.manbolo.com/2012/10/29/supporting-new-emojis-on-ios-6 http://blog.manbolo.com/2011/12/12/supporting-ios-5-new-emoji-encoding

See the hex bytes of a string inside the DB: How can I see raw bytes stored in a MySQL column?

Details about how to setup MySQL: http://technovergence-en.blogspot.com/2012/03/mysql-from-utf8-to-utf8mb4.html

In depth FAQ of utf8- http://www.unicode.org/faq/utf_bom.html#utf8-4

Details about the difference from notation: \ud83d\udc7d and hex value in memory: 0xF09F91BD http://en.wikipedia.org/wiki/UTF-8#Description

Use this to copy and paste characters in to see real hex byte values (works for emojis): http://perishablepress.com/tools/utf8-hex/index.php

Get Spring to support utf8 in urls (for GET params) http://forum.springsource.org/showthread.php?93728-RequestParam-doesn-t-seem-to-be-decoded Get Parameter Encoding http://forum.springsource.org/showthread.php?112181-Unable-to-Override-the-Spring-MVC-URL-decoding-which-uses-default-quot-ISO-8859-1-quot

Community
  • 1
  • 1
benathon
  • 7,455
  • 2
  • 41
  • 70
  • Well in that case I would try to find a correlation between the bytes on disk and the \u notation that iOS uses. Run a for loop or something that will help you see a pattern. Worse case just build a lookup table for conversions. I dono how big the utf character space is but ya. Let me know if you need more help – benathon Jan 19 '13 at 14:20
  • This guy has a for loop already. This code may help you http://stackoverflow.com/a/9392097/836450 – benathon Jan 19 '13 at 14:27
2

My answer code is equivalent to IOS NSNonLossyASCIIStringEncoding for Android.

In your gradle put below depandancy.

 compile 'org.apache.commons:commons-lang3:3.4'

then Put method to your Utils Class Like this

 public static String encode(String s)
{
    return StringEscapeUtils.escapeJava(s);

}

public static String decode(String s)
{
    return StringEscapeUtils.unescapeJava(s);

}

then Simply call this method where you want to encode string or decode String like this

//for encode
String stencode = Utils.encode("mystring");


//for decode
String stdecode = Utils.decode("mystring")