0

My website uses the default ISO-8859-1 encoding, each page is a jsp and runs in the servlet container Apache Tomcat 7.0.30.

e.g

http://www.jthink.net/songkong/jp/support.jsp

But now I have translated some pages to Japanese and therefore need to be encoded in something that supports this charset, Ive gone with UTF-8

By simply adding

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

and changing charset of meta tag

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

to this page it is now rendering correctly in my firefox browser.

http://www.jthink.net/songkong/jp/support.jsp

But the answer to this question How to get UTF-8 working in Java webapps? says I also need a CharFilter, do I need this as well - I'm not clear what it does ? I'd rather not add it, not least because I expect it could break my current ISO-8859-1 pages. My non japanese pages are still encoded as ISO-8859-1 and Im undecided whether to convert these as UTF-8 or leave as they are. I'm also concerned it would break the paypal purchase verification code.

Update

Just realized that my web.xml file already contains a specification of a CharacterEncoding filter to set things UTF-8. I don't remember why I have this or what it actually does or whether I should have it seeing as most of my pages are not UTF-8

<filter>
  <filter-name>CharacterEncoding</filter-name>
  <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
  <init-param>
    <param-name>encoding</param-name>
    <param-value>UTF-8</param-value>
  </init-param>
</filter>
<mime-mapping>
  <extension>html</extension>
  <mime-type>text/html;charset=UTF-8</mime-type>
</mime-mapping>
</web-app>
Community
  • 1
  • 1
Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
  • 1
    This answer is 7 years old (although has been updated since), and AFAIK all major servlet containers support UTF-8 without the need of an additional filter. The new Tomcat versions (of all 6.x, 7.x and 8.x lines) definitely do. Converting all the pages to UTF-8 would make sense to me. Not sure about the paypal thing. – Jozef Chocholacek Aug 31 '15 at 08:29
  • @JozefChocholacek thanks it did seem weird to need it but I could find anyhting saying I didnt need it – Paul Taylor Aug 31 '15 at 08:33

1 Answers1

0

The CharsetFilter is used to set the character encoding of the browser, if it is on autoselect. You don't need this however, as all supported Tomcat versions (and all servlet containers, really) do this by default. No manual hacking is required.

meskobalazs
  • 15,741
  • 2
  • 40
  • 63
  • Thanks, what about URIEncoding="UTF-8" on server.xml ? – Paul Taylor Aug 31 '15 at 08:32
  • If you want to use UTF-8 encoded URIs, then you can use it. It is not mandatory. I myself only use ASCII URLs, so I never set it. – meskobalazs Aug 31 '15 at 08:33
  • Just realized I do already have some sort of UTF8 filter defined in my web.xml, Im not sure why, Ive updated question please could yo comment on it. – Paul Taylor Aug 31 '15 at 08:43
  • It is an example Filter in Tomcat, I don't think that you actually need it. – meskobalazs Aug 31 '15 at 08:46
  • 1
    @meskobalazs This is plain wrong! You cannot set any encoding from the server on the client. A CharEncodingFilter is needed for the server to properly transform URL encoded bytes to strings from a query portion. And no -- the server cannot select anything automatically. – Michael-O Aug 31 '15 at 09:18
  • @Michael-O I dont understand difference between CharEncodingFilter and set URIEncoding in server.xml it sounds like they do the same thing – Paul Taylor Sep 01 '15 at 09:47
  • @PaulTaylor There is a subtle difference: Have a look at the documentation for [`request#setCharacterEncoding`](https://docs.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#setCharacterEncoding%28java.lang.String%29), it affects the payload and/or the request parameters. While `URIEncoding` affects the the [URI decoding only](https://github.com/apache/tomcat/search?l=java&q=URIEncoding+&utf8=%E2%9C%93). Ist that clear? It's best to set both to have UTF-8 every where. This is what I do for years now. – Michael-O Sep 01 '15 at 11:02