2

I have a default installation of Tomcat 8.5.6. It seems like UTF-8 encoded requests are not being interpreted correctly, even though the docs say the default (if not in strict mode) should be UTF-8 everywhere these days. My java POST requests look like:

HttpPost post = new HttpPost(url);
post.setEntity(new UrlEncodedFormEntity(nameValuePairs, HTTP.UTF_8));
...

Testing, I see the tilde character ñ is not decoded correctly in my servlet handler:

public class MyServlet extends HttpServlet {
    protected void doPost(HttpServletRequest request, ...) {
        String tildeTest = request.getParam("foo"); // no good.
    }
}

if I explicitly set the encoding on the request before access, it decodes properly:

protected void doPost(HttpServletRequest request, ...) {
    request.setCharacterEncoding("UTF-8");
    String tildeTest = request.getParam("foo"); // works!
    ...
}

so I'm not sure if:

  1. Tomcat 8.5.6 is not really using UTF-8 everywhere, and I need to set that manually in the config files somewhere.

  2. My http request is missing some header which tells Tomcat which encoding to use - perhaps the http post is defaulting to some other encoding which Tomcat is just honoring.

Anyone know which one?

Thanks

user3203425
  • 2,919
  • 4
  • 29
  • 48
  • Point 3) of answer http://stackoverflow.com/a/11185963/3511123 could be what you need. (I mean the `URIEncoding="UTF-8"` attribute for `` config. – Jozef Chocholacek Oct 17 '16 at 06:39
  • @JozefChocholacek yeah I saw that, but the docs say that everything should be utf-8 by default now - wanted to get clarification about why this appears to not be utf-8 before I start changing stuff. – user3203425 Oct 17 '16 at 15:52

1 Answers1

0

https://wiki.apache.org/tomcat/FAQ/CharacterEncoding

POST requests should specify the encoding of the parameters and values they send. Since many clients fail to set an explicit encoding, the default is used (ISO-8859-1).

What can you recommend to just make everything work? (How to use UTF-8 everywhere).

There are 6 ways listed to ensure this, for servlet requests 1,2 should be relevant

  1. Set URIEncoding="UTF-8" on your in server.xml. References: HTTP Connector, AJP Connector.
  2. Use a character encoding filter with the default encoding set to UTF-8
kuhajeyan
  • 10,727
  • 10
  • 46
  • 71