3

I'm kind of a new programmer, and I'm having a couple of problems with the code I'm handling.

Basically what the code does is receive a form from another JSP, read the bytes, parse the data, and submit the results to SalesForce, using DataInputStream.

   //Getting the parameters from request
 String contentType = request.getContentType();
 DataInputStream in = new DataInputStream(request.getInputStream());
 int formDataLength = request.getContentLength();

 //System.out.println(formDataLength);
 byte dataBytes[] = new byte[formDataLength];
 int byteRead = 0;
 int totalBytesRead = 0;
 while (totalBytesRead < formDataLength) 
 {
  byteRead = in.read(dataBytes, totalBytesRead, formDataLength);
  totalBytesRead += byteRead;
 }

It works fine, but only if the code handles normal characters. Whenever it tries to handle special characters (like french chars: àâäæçéèêëîïôùûü) I get the following gibberish as a result:

à âäæçéèêëîïôùûü

I understand it could be an issue of DataInputStream, and how it doesn't return UTF-8 encoded text. Do you guys offer any suggestions on how to tackle this issue?

All the .jsp files include <%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%> and Tomcat's settings are fine (URI = UTF-8, etc). I tried adding:

request.setCharacterEncoding("UTF-8");

and

response.setCharacterEncoding("UTF-8");

to no avail.

Here's an example of how it parses the data:

    //Getting the notes for the Case 
 String notes = new String(dataBytes);
 System.out.println(notes);
 String savenotes = casetype.substring(notes.indexOf("notes"));
 //savenotes = savenotes.substring(savenotes.indexOf("\n"), savenotes.indexOf("---"));
 savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
 savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
 savenotes = savenotes.substring(0,savenotes.indexOf("name=\"datafile"));
 savenotes = savenotes.substring(0,savenotes.lastIndexOf("\n------"));
 savenotes = savenotes.trim();

Thanks in advance.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
jorgemoya
  • 55
  • 1
  • 2
  • 7

1 Answers1

7

The problem is not in the inputstreams since they doesn't handle characters, but only bytes. Your problem is at the point you convert those bytes to characters. In this particular case, you need to specify the proper encoding in the String constructor.

String notes = new String(dataBytes, "UTF-8");

See also:


By the way, the DataInputStream has no additional value in the particular code snippet. You can just keep it InputStream.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Thanks for the swift response, sir. Actually, I tried that before posting here (due to a post a found from a couple of years ago) and proceeded to implement that solution. This is were it gets weird, tho: If I use that method, and print out the notes, the characters will display, although not correctly due to consolechar limitation, the equivalent of the UTF-8 that I know are àâäæçéèêëîïôùûü. However, once it parses the info, and I print out savenotes, it will display à âäæçéèêëîïôùûü (again, not those chars due to console char limitation, but the equivalent). – jorgemoya Dec 22 '10 at 00:18
  • Use an IDE (first configure it to use UTF-8 in console), or write to file (with `OutputStreamWriter` and `"UTF-8"`) and then open in a texteditor, or display it as Swing JPanel or so. A command console can't display UTF-8 characters beyond ASCII range. – BalusC Dec 22 '10 at 00:22
  • Well yes, I understand that completely. Anyhow, using new String(dataBytes, "UTF-8") doesn't solve the problem. I still end up with à âäæçéèêëîïôùûü as my note text. – jorgemoya Dec 22 '10 at 00:28
  • Well, that doesn't give much to help you further. At least, the problem is clearly at exactly that point when those bytes are turned (encoded) into characters. That point is incorrectly using ISO-8859-1 for this instead of UTF-8. You've already set the JSP page encoding to UTF-8 and the inputstream clearly contains UTF-8 bytes, so that part is fine. – BalusC Dec 22 '10 at 00:41
  • Your solution worked. I figured out I had not managed another crucial string as UTF-8. Thanks. – jorgemoya Dec 22 '10 at 19:04