4

I've got a Silverlight client application that sends a string "including characters ş ţ ă and â î" over to a Java jax-ws SOAP service.

Now, no matter what I do, I always get "including characters ? ? ? and â î" on the other side. ( "â î" work but the others don't).

I've even tried HttpUtility.UrlEncode("ş ţ ă and â î") in silverlight, but URLDecoder.decode(inputText, "UTF-8") in Java still gives me ?s instead of those 3 characters.

What's going on? Java strings are supposed to be encoded in UTF-8 by default, right? And the encoding in .net is Unicode (actually UTF-16). But if I decode with Unicode or UTF-16 on the java side, I get ALL those special chars turned to ?s (â î included).

Any help much appreciated!


[edit] I would love to see what encoding am I using on the Silverlight side, or to specify an encoding myself. The problem is, I can't figure out where/how to do this: The client I've created was by Service References -> Add Reference where I specified the WSDL, and from there, .NET did everything for me, created a Client class and the required events and functions. Here's what the gist of my client looks like:

            FooWildcardSOAPClient client = new FooWildcardSOAPClient();
            client.CallFooServiceCompleted += new EventHandler<CallFooServiceCompletedEventArgs>(client_CallFooServiceCompleted);

            client.CallFooServiceAsync(param1, HttpUtility.UrlEncode(inputString), args); 

I browsed the auto generated code but couldn't figure out where to specify an encoding.

And here is the Java side:

@WebService(targetNamespace = "http://jaxwscalcul.org", 
        name="FooWildcardSOAP", 
        serviceName="FooWildcardService")
@SOAPBinding(   style=SOAPBinding.Style.DOCUMENT, 
        use=SOAPBinding.Use.LITERAL)
public class FooWildcardServiceImpl {

    @WebMethod(operationName="CallFooService", action="urn:FooWildcardService")
    @WebResult(name="result")
    public String getOutput(
            @WebParam(name="FooServiceWSDL") String param1,
            @WebParam(name="inputTextOrXML") String inputText,
            @WebParam(name="otherArgsString") String[] otherArgs)
    {
        try {
            inputText = URLDecoder.decode(inputText, "UTF-16LE");//ISO-8859-1
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        System.out.println("\r\n\r\n"+inputText);
    }

[EDIT2] I've used Fiddler, and I can see that the content on the wire is text/xml UTF-8, and the actual data, as in the "ş ţ ă" chars that don't show in java, DO show on the wire, correctly.

Here's a few pastes from Fiddler:

Client:
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6,ro;q=0.4,fr-FR;q=0.2,de;q=0.2
Entity:
content-type: text/xml; charset=utf-8
Spectraljump
  • 4,189
  • 10
  • 40
  • 55
  • 1
    Sounds like you will benefit from utilities that can tell you the exact bytes going back and forth. – Thorbjørn Ravn Andersen Jun 24 '11 at 15:27
  • I think you are confusing internal representation of characters with default encodings. Interally, all Java strings are represented as UTF-16, but this has nothing to do with the "default" encoding. Most methods that take an optional `Charset` or `String` argument (allowing you to specify a character set to use) will use the **platform** default charset when that optional argument is omitted. – Matt Ball Jun 24 '11 at 15:30
  • For example: [`String#getBytes()`](http://download.oracle.com/javase/6/docs/api/java/lang/String.html#getBytes%28%29) and [`String#getBytes(Charset)`](http://download.oracle.com/javase/6/docs/api/java/lang/String.html#getBytes%28java.nio.charset.Charset%29) – Matt Ball Jun 24 '11 at 15:33
  • Several things: 1. The default encoding for Strings in Java is UTF-16. Remember, this is for Strings that are not created in a different encoding. 2. You haven't specified the encoding of the request. Java will not automagically discover this for you; if you have sent a ISO-8859-1 encoded request, specifying UTF-16 to the decoder will return garbage, for the decoder will apply the UTF-16 rules. 3. Use a UTF-8/16 capable viewer (with a suitable font) to view the contents of Unicode text. 4. Post code. – Vineet Reynolds Jun 24 '11 at 15:34
  • Try using Fiddler to look at the actual format on the "wire" (www.fiddler2.com). – miguelv Jun 24 '11 at 15:49
  • I have tried fiddle and I updated my question with what happened. – Spectraljump Jun 25 '11 at 10:16
  • 1
    @Twodordan, avoid using `System.out.println()`. It uses the platform encoding, which might/will not be UTF-8. Instead, write the string to a file with a known encoding. An `OutputStreamWriter` would be suitable if you want to explicitly specify encodings. – Vineet Reynolds Jun 28 '11 at 17:56
  • Thank you Vineet Reynolds. That was it! I was worried about nothing. Cheers! – Spectraljump Jun 29 '11 at 10:21

1 Answers1

5

Via Luther Blissett's answer "UTF-16 != UTF-16":

In Java, getBytes("UTF-16") is big-endian.

In C#, Encoding.Unicode.GetBytes is little-endian.

On the Java side, try getBytes("UTF-16LE").

For a detailed explanation, see Big and little endian byte order.

Community
  • 1
  • 1
Timothy Lee Russell
  • 3,719
  • 1
  • 35
  • 43
  • I did not know that, but it doesn't fix the problem. I get ?s for all the symbols. – Spectraljump Jun 24 '11 at 16:28
  • The exact same thing (as I said in the question, every special char except "â î" get turned into ?s). The reason I added urlencode/decode was to try and fix the problem. – Spectraljump Jun 24 '11 at 17:20
  • 1
    What does the data on the wire look like? The things I have seen seem to suggest that the data is being transmitted as Ascii. Use Fiddler to check that the Content-Type is "text/xml; charset=utf-8". – Timothy Lee Russell Jun 24 '11 at 21:38
  • Well, this is weird, Fiddler shows that the data on the wire is text/xml utf-8; and all the characters show as they should in Fiddler's TextView. But when the data gets to the service, and gets printed to console, "?"s pop up instead of "ş ţ and ă"... – Spectraljump Jun 25 '11 at 09:58