1

We can set the default character encoding to use for reading request bodies by ServletContext#setRequestCharacterEncoding (since Servlet 4.0).

I think that the character encoding for HttpServletRequest#getReader can be set using ServletContext#setRequestCharacterEncoding(*).

But the reader that HttpServletRequest#getReader returns seems to decode characters not using the encoding set by ServletContext#setRequestCharacterEncoding.

My questions are:

  • Why ServletContext#setRequestCharacterEncoding does not have an effect on HttpServletRequest#getReader(but it have an effect on HttpServletRequest#getParameter)?
  • Is there any specification describing such ServletContext#setRequestCharacterEncoding and HttpServletRequest#getReader behaviors?

(I read Servlet Specification Version 4.0, but I can't find any spec about such behaviors.)

I have created a simple war application and tested ServletContext#setRequestCharacterEncoding.

[Env]

  • Tomcat9.0.19 (I don't change any default configuration)
  • JDK11
  • Windows8.1

[index.html]

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
</head>
<body>
    <form action="/SimpleWarApp/app/simple" method="post">
        <!-- The value is Japanese character '\u3042' -->
        <input type="text" name="hello" value="あ"/>
        <input type="submit" value="submit!"/>
    </form>
    <button type="button" id="the_button">post</button>
    <script>
        document.getElementById('the_button').addEventListener('click', function() {
            var xhttp = new XMLHttpRequest();
            xhttp.open('POST', '/SimpleWarApp/app/simple');
            xhttp.setRequestHeader('Content-Type', 'text/plain');
            <!-- The body content is Japanese character '\u3042' -->
            xhttp.send('あ');
        });
    </script>
</body>
</html>

[InitServletContextListener.java]

@WebListener
public class InitServletContextListener implements ServletContextListener {
    @Override
    public void contextInitialized(ServletContextEvent sce) {
        sce.getServletContext().setRequestCharacterEncoding("UTF-8");
    }
}

[SimpleServlet.java]

@WebServlet("/app/simple")
@SuppressWarnings("serial")
public class SimpleServlet extends HttpServlet {

    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        // req.setCharacterEncoding("UTF-8");
        System.out.println("requestCharacterEncoding : " + req.getServletContext().getRequestCharacterEncoding());
        System.out.println("req.getCharacterEncoding() : " + req.getCharacterEncoding());

        String hello = req.getParameter("hello");
        if (hello != null) {
            System.out.println("hello : " + req.getParameter("hello"));
        } else {
            System.out.println("body : " + req.getReader().readLine());
        }
    }
}

I don't have any servlet filters. The above three are all the components of this war application. (GitHub)

Case 1: When I submit the form with a parameter 'hello', the value of 'hello' is successfully decoded as follows.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
hello : あ

Case 2: When I click 'post' and send text content, the request body cannot be successfully decoded as follows. (Although I confirm that the request body is encoded by UTF-8 like this: E3 81 82)

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???

Case 3: When I also set the encoding using HttpServletRequest#setCharacterEncoding at the first line of the servlet's 'doPost' method instead, the request body successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ

Case 4: When I use http.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8'); javascript, the request body successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ

Case 5: When I do not call req.getParameter("hello"), the request body cannot be successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???

Case 6: When I do not call ServletContext#setRequestCharacterEncoding at InitServletContextListener.java, no character encoding is set.

requestCharacterEncoding : null
req.getCharacterEncoding() : null
body : ???

[NOTE]

  • (*)I think so because:

    • (1) The java doc of HttpServletRequest#getReader says

      "The reader translates the character data according to the character encoding used on the body".

    • (2) The java doc of HttpServletRequest#getCharacterEncoding says

      "Returns the name of the character encoding used in the body of this request".

    • (3) The java doc of HttpServletRequest#getCharacterEncoding also says

      "The following methods for specifying the request character encoding are consulted, in decreasing order of priority: per request, per web app (using ServletContext.setRequestCharacterEncoding, deployment descriptor)".

  • ServletContext#setResponseCharacterEncoding works fine. When I use ServletContext#setResponseCharacterEncoding, The writer that HttpServletResponse#getWriter returns encodes the response body by the character encoding set by it.

Tomoki Sato
  • 578
  • 4
  • 11
  • What happens if you use ` http.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8'); ` javascript? Your finding is interesting. Also what happens if you DO NOT call ` req.getParameter("hello")` before reading a body buffer? – Whome May 11 '19 at 04:30
  • Also do you have any requestfilters on top of a servlet to mess up the request.characterencoding property? If you do not set a context.characterencoding is there a difference. I think you should get NULL from the request.getCharacterEncoding() if none set it a value. – Whome May 11 '19 at 04:36
  • 1
    I have tested ` http.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8'); ` javascript(Case 4) and servlet not calling `req.getParameter("hello")`(Case 5). I have edited my question. – Tomoki Sato May 11 '19 at 10:20
  • 1
    I don't have any Servlet Filters. The above three are all the components of my war application. I have tested application not calling `ServletContext#setRequestCharacterEncoding`(Case 6). I have edited my question. – Tomoki Sato May 11 '19 at 10:25
  • Could be Tomcat bug. Best I think is not to use `context.setRequestCharacterEncoding` method. Check for `request.getCharacterEncoding()==null` then set UTF-8 encoding on every servlet code. – Whome May 11 '19 at 20:56
  • I have posted my question to tomcat-users mailing list. If I come to a conclusion, I will post it as an answer. – Tomoki Sato May 12 '19 at 05:23
  • Ok, final hint always use `` html head value and `content-type: text/html; charset=UTF-8` reply header (html, json, any reply type) to be explicit on UTF-8. This guarantees clients to a proper form GET/POST encoding. – Whome May 12 '19 at 06:24

1 Answers1

1

It is an Apache Tomcat bug (specific to getReader()) that will be fixed in 9.0.21 onwards thanks to your report on the Tomcat users mailing list.

For the curious, here is the fix.

Mark Thomas
  • 16,339
  • 1
  • 39
  • 60