0

So I used this code to get the response body (source code of the page accessed) for .jsp page
can some one please help me as to how do i extract the response body for .html page.

public class DetailFilter implements Filter {
    private FilterConfig config;
    public DetailFilter() {
        super();
    }

    public void init(final FilterConfig filterConfig) throws ServletException {
        this.config = filterConfig;
    }


    public void destroy() {
        config = null;
    }

    public void doFilter(final ServletRequest request, final ServletResponse response,
                         final FilterChain chain) throws IOException, ServletException {

        ServletResponse newResponse = response;

        if (request instanceof HttpServletRequest) {
            newResponse = new CharResponseWrapper((HttpServletResponse) response);
        }

        chain.doFilter(request, newResponse);

        if (newResponse instanceof CharResponseWrapper) {
            String text = newResponse.toString();

            if (text != null) {
                response.getWriter().write(text);
                System.out.println("text is: "+text);
            }
        }
    }
}


public class CharResponseWrapper extends HttpServletResponseWrapper{
    protected CharArrayWriter charWriter;

    protected PrintWriter writer;

    protected boolean getOutputStreamCalled;

    protected boolean getWriterCalled;

    public CharResponseWrapper(HttpServletResponse response) {
        super(response);

        charWriter = new CharArrayWriter();
    }

    public ServletOutputStream getOutputStream() throws IOException {
        if (getWriterCalled) {
            throw new IllegalStateException("getWriter already called");
        }

        getOutputStreamCalled = true;
        return super.getOutputStream();
    }

    public PrintWriter getWriter() throws IOException {
        if (writer != null) {
            return writer;
        }
        if (getOutputStreamCalled) {
            throw new IllegalStateException("getOutputStream already called");
        }
        getWriterCalled = true;
        writer = new PrintWriter(charWriter);

        return writer;
    }

    public String toString() {
        String s = null;
        if (writer != null) {
            s = charWriter.toString();
        }
        System.out.println("tosting is:"+s);
        return s;
    }
}

The problem is for a .jsp page getWriter() method(in CharResponseWrapper) is being called and value is returned in writer but for .html page ServletOutputStream is called and it returns null value.

I also tried URLConnection and InputStreamReader for the same. Code i used is mentioned below

HttpServletRequest hReq = (HttpServletRequest) request;
StringBuffer ss=hReq.getRequestURL();
String u=ss.toString();
URL url = new URL(u);
URLConnection con = url.openConnection();
System.out.println("Connection successful");
InputStream is =con.getInputStream();
System.out.println("InputStream Successful");
BufferedReader br = new BufferedReader(new InputStreamReader(is));

String line = null;
String[] arr={};
while ((line = br.readLine()) != null) {
    System.out.println(line);
}

The code goes well and prints "Connection successful" on console but then it goes on as a infinite loop and never really executes"InputStream Successful". To my understanding once the connection is created when we call InputStream it sends a request to the same url and the whole process is repeated again and again. May be this process works only for a particular url eg url="www.abcd.com"

I want to extract the response body of the .html page for some data manipulation. any help on this plz.

EDIT

To continue this question, after I get the response body. I am inserting JS before the tag. When I SOP that string I see the inserted response body. Till this step all is fine.I convert it to byte array and write the byte array in servletoutputstream instance.

ServletOutputStream newResponse1= response.getOutputStream();
newResponse1.write(bArray);
newResponse1.close();

where bArray is the response body with JS inserted, in byte array format. response is ServletResponse. The output I get is strange. JSP page gets executed twice. Means if i have a button on jsp page, it shows that same button two times and the JS is executed. HTML page gets executed once but that final response, is not the same response I wrote. means the bArray data(injected data)I wrote is not the same response I see on browser.

I feel I need to override the getOutputStream method again which unfortunately I am not able to. Please help. revert if question is not clear.

I also took reference from How to read and copy the HTTP servlet response output stream content for logging

Community
  • 1
  • 1
Aditya
  • 79
  • 1
  • 8
  • How are you sure that for an Html page the output is written through a ServletOutputStream? If it was so, it would be enough to create your own HttpServletResponse to encapsulate your own ServletOutputStream (in the same fashion CharResponseWrapper does). – Little Santi Jul 23 '15 at 20:03
  • I dubugged my code on each line and found that for a jsp page after doFilter method is called the control goes to getWriter() method and returns the response but if the page is html the control goes to the if condition in servletOutputStream and then to outputstreamcalled=true and then returns null value. May be u can try it on your system just create one html and one jsp page and the above given filter. Mind you that there is no other servlet used. – Aditya Jul 24 '15 at 04:31

1 Answers1

0

You need to intercept the output delivered to the underlying ServletOutputStream, in the same way CharArrayWriter does. So, I recommend you modify the getOutputStream method to encapsulate the returned object into your own instance of ServletOutputStream, and store it as an instance variable of CharResponseWrapper (the same as CharArrayWriter). It would enough to be like this:

public class MyServletOutputStream extends ServletOutputStream
{
    private final ServletOutputStream src;

    private final StringBuilder stb=new StringBuilder(4096);

    public MyServletOutputStream(ServletOutputStream src)
    {
        super();
        this.src=src;
    }

    @Override
    public void write(int b)
        throws IOException
    {
        this.src.write(b);
        this.stb.append((char)b);
    }

    public StringBuilder getStb()
    {
        return this.stb;
    }
}

Last, modify the toString method to decide what object it has to get the data from: CharArrayWriter or MyServletOutputStream.

Little Santi
  • 8,563
  • 2
  • 18
  • 46
  • Solved thnx santi for ur help using servlet output stream i am able to capture the html response now. Wondering why getWriter method does not get executed for html page. – Aditya Jul 24 '15 at 19:58
  • You are welcome! Why ServletOutputStream instead of Writer in HTML pages? Well, I am not sure, but I guess it is a matter of _encoding_: Writers consume chars-based data, but OutputStreams consume binary-based data. A JSP contains an explicit parameter declaring its actual encoding (the `@page pageEncoding` header), which the JSP container may understand. Instead, an HTML page does not declare its actual encoding in a way that the JSP container understands (the JSP containers do not handle HTML format). The encoding of an HTML page is oriented to the navigator, and so it is served as binary. – Little Santi Jul 24 '15 at 23:56
  • @LittleSanti great answer. Maybe you could help [on this question](http://stackoverflow.com/q/42440437/3493036), too. – Patrick Feb 27 '17 at 15:28