0

I have a Tomcat 7 webapp and am having problems with character sets. My goal is to force everything into UTF-8 and just be done with it. I'm actually surprised that in 2014 not everything defaults to UTF-8...

I read the docs and have uncommented the org.apache.catalina.filters.AddDefaultCharsetFilter filter in the system's default web.xml.

/etc/tomcat/web.xml:

<filter>
    <filter-name>setCharacterEncodingFilter</filter-name>
    <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>UTF-8</param-value>
    </init-param>
    <async-supported>true</async-supported>
</filter>

<filter-mapping>
    <filter-name>setCharacterEncodingFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

I have also added URIEncoding="UTF-8" to the Connectors in the server.xml:

<Connector connectionTimeout="20000" port="8080" protocol="HTTP/1.1" redirectPort="8443" URIEncoding="UTF-8"/>

Doing this (and a bunch of other stuff like jdbc params) seems to get the request into UTF-8. But how do I force the Response to UTF-8?

i.e.

protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
    System.out.printf("Req: %s\n", req.getCharacterEncoding());
    System.out.printf("Resp: %s\n", resp.getCharacterEncoding());

yields:

Req: UTF-8
Resp: ISO-8859-1

Thanks

PrecisionPete
  • 3,139
  • 5
  • 33
  • 52
  • Does the client send a `Content-Type` with a `charset` parameter? Tomcat will not override a client's setting. – Christopher Schultz Apr 01 '14 at 01:08
  • Client meaning the browser? Would that no be part of the Request? It's the response that's not working. Also, each page in the app starts with `<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>`. And the doGet behaves identically. – PrecisionPete Apr 01 '14 at 23:22
  • Enabling the filter in the web.xml did solve my character distortion problem. I just find it odd that it's not coming out UTF-8... – PrecisionPete Apr 01 '14 at 23:23

3 Answers3

0

I didn't read your question carefully enough before. The default response encoding is ISO-8859-1 due to the HTTP standard. If you want to change it, you'll have to set the encoding on the response yourself.

What you really ought to do is read the Tomcat FAQ which has an entry for the exact question you have asked: What can you recommend to just make everything work? (How to use UTF-8 everywhere).

Christopher Schultz
  • 20,221
  • 9
  • 60
  • 77
0

It looks like you are talking about two different filters:

  • AddDefaultCharsetFilter (applies to response)
  • SetCharacterEncodingFilter (applies to request)

In your case, you want to use AddDefaultCharsetFilter as explained here:

http://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#Add_Default_Character_Set_Filter

0

Effectively the value set by this filter is used when parsing parameters in a POST request, if parameter parsing occurs later than this filter. Thus the order of filter mappings is important. Note that the encoding for GET requests is not set here, but on a Connector. See CharacterEncoding page in the FAQ for details.

https://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#Set_Character_Encoding_Filter

podo
  • 1