0

I just get a strange encoding problem in java web project.

System.out.println("search url: " + searchURL);    
searchURL = new String(searchURL.getBytes("utf-8"), "utf-8");
System.out.println("test===" + new String(searchURL.getBytes("utf-8")));

I test the code above in java main function, and in chinese character it works all right.

output:
search url: https://api.datamarket.azure.com/Data.ashx/Bing/Search/Image?Query=%27机器 猫%27&$format=json&$skip=0

test===https://api.datamarket.azure.com/Data.ashx/Bing/Search/Image?Query=%27机器 猫%27&$format=json&$skip=0

But when runs this code in tomcat.

output:
search url: https://api.datamarket.azure.com/Data.ashx/Bing/Search/Image?Query=%27机器 猫%27&$format=json&$skip=0

test===https://api.datamarket.azure.com/Data.ashx/Bing/Search/Image?Query=%27鏈哄櫒 鐚?27&$format=json&$skip=0

then i test this in tomcat:

searchURL = new String(searchURL.getBytes("utf-8"), "utf-8");
System.out.println(new String(searchURL.getBytes("gbk"));
System.out.println(new String(searchURL.getBytes("gb2312"));

both above is ok. so why ? Any suggestion will be appreciated, really thx !

santi
  • 117
  • 3
  • 11

1 Answers1

0

the default charset will be different between your jvm and the tomcat jvm

try

System.out.println(Charset.defaultCharset());

this will use the default charset to encode the string which may or may not be utf-8

System.out.println("test===" + new String(searchURL.getBytes("utf-8")));

so while the byte array is utf-8 the decoder may expect something else.

BevynQ
  • 8,089
  • 4
  • 25
  • 37
  • searchURL = new String(searchURL.getBytes("utf-8"), "utf-8"); by this, doesn't it changes the searchURL to utf-8 format? – santi Jan 23 '13 at 05:04
  • @santi: I have added a clarification – BevynQ Jan 23 '13 at 05:08
  • @santi: java strings are always `utf-16`. The byte array produced by getBytes will be `utf-8` but once it gets made into a `String` it will be decoded back to `utf-16` using the specified charset. – BevynQ Jan 23 '13 at 05:11
  • You are right, really thanks. But i have set the tomcat server,xml. – santi Jan 23 '13 at 05:22
  • How can i change the default characterSet in tomcat ? I am new to tomcat , my apologize – santi Jan 23 '13 at 05:40
  • try `-Dfile.encoding=UTF-8` for starting up tomcat. Not sure if that will work or not. – BevynQ Jan 23 '13 at 05:57
  • Actually, my question is that the same url connection request when i send it in the java main function, the bing sarch api return correctly. but in tomcat it Garbled. so i think it must be the reason as characterset is not the utf-8 in tomcat. System.out.println(Charset.defaultCharset()); return utf-8 in java main. but return gbk in tomcat. – santi Jan 23 '13 at 06:39