2

i'm trying to implements a POST request with HttpURLConnection. This is my code:

private static void call(String body) throws IOException{
    HttpURLConnection con = null;

    con = (HttpURLConnection)new URL("http://127.0.0.1:8080").openConnection();

    con.setRequestProperty("Accept-Charset", "UTF-8");
    con.setRequestMethod("POST");
    con.setRequestProperty("Content-Type", "application/json; charset=utf-8"); 
    con.setRequestProperty("Accept", "application/json; charset=utf-8");

    con.setDoOutput(true);
    DataOutputStream wr = new DataOutputStream(con.getOutputStream());
    wr.writeBytes(body);
    wr.flush();
    wr.close();
    ...
 }

I post it to localhost just to sniff it with WireShark. The problem is that when my body is a string containing characters like 'ò' 'à' 'è' 'ç' ... the request i see has le string correct with those characters replaced by dots.

example: if body is "hèllo!" ---> the request body is "h.llo!"

Just for test i'm executing the above method in java main and i pass the parameter this way:

String pString = "{\"titlè\":\"Hèllo Wòrld!\"}";
String params = new String(pString.getBytes("UTF-8"),"UTF-8");
....
call(body);

and this is what i get in WireShark:

POST / HTTP/1.1
Accept-Charset: UTF-8
Content-Type: application/json; charset=utf-8
Accept: application/json; charset=utf-8
User-Agent: Java/1.6.0_43
Host: 127.0.0.1:8080
Connection: keep-alive
Content-Length: 24

{"titl.":"H.llo W.rld!"}

Any help would be appreciated. Thank you

Ve9
  • 345
  • 3
  • 15

1 Answers1

6

The internal string representation in Java is always UTF-16. So in your second example params = new String(pString.getBytes("UTF-8"),"UTF-8"); converts pString to a byte array with UTF-8 content and then back to UTF-16 which is stored in params. Every encoding has to be done when strings enter or leave the VM. That means in your case you have to set the encoding when you write the body to the stream.

wr.write(body.getBytes("UTF-8"));
ChristophT
  • 510
  • 5
  • 10
  • thanks for answer but if i do wr.write(**"{\"titlè\":\"Hèllo Wòrld!\"}".getBytes("UTF-8")**); i get the same result but with 2 dots instead of one (for each special characters) ---> **{"titl..":"h..llo w..rld!"}** – Ve9 Oct 18 '13 at 09:55
  • Two bytes per special character should be correct in UTF-8. I'm not familiar with Wireshark - are you sure it can display UTF-8 characters? See http://stackoverflow.com/questions/9825440/wireshark-can-i-decode-utf-8-data-in-the-packets – ChristophT Oct 20 '13 at 12:28