The setting of my problem is as follows:
In a client/server architecture including web service communication I get on the server side a CSV file from the client. The API gives me a org.apache.commons.fileupload.FileItem
Allowed codepages for those files are codepage 850 and codepage 1252.
Everything works properly, the only problem is the euro sign (€). In case of codepage 1252 my code isn't able to handle the Euro sign correctly. Instead of it I see the sign with the unicode U+00A4: ¤ when I print it to the console in Eclipse.
Currently I use the following code. It is spread over some classes. I've extracted the lines that are relevant.
byte[] inputData = call.getImportDatei().get();
// the following method works correctly
// it returns Charset.forName("CP850") or Charset.forName("CP1252")
final Charset charset = retrieveCharset(inputData);
char[] stringContents;
final StringBuffer sb = new StringBuffer();
final String s = new String(inputData, charset.name());
// here I see the problem with the euro sign already
// the following code shouldn't be the problem
// here some special characters are converted, but this doesn't affect the problem, so I removed those lines
stringContents = s.toCharArray();
for(final char c : stringContents){
sb.append(c);
}
final Reader stringReader = new StringReader(sb.toString());
// org.supercsv.io.CsvListReader
CsvListReader reader = new CsvListReader(stringReader, CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);
// now this reader is used to read the CSV content...
I tried different stuff:
FileItem.getInputStream()
I used FileItem.getInputStream() to get the byte[] but the result was the same.
FileItem.getString()
When I use FileItem.getString() it works perfectly with codepage 1252: The euro sign is read correctly. I see it when I print it to the console in Eclipse. But with code page 850 many special characters are wrong.
FileItem.getString(String encoding)
So my idea was to use FileItem.getString(String encoding). But all Strings that I tried to tell him to use codepage 1252 produced no exceptions but wrong results.
e.g. getString(Charset.forName("CP1252").name()) leads to a question mark instead of the euro sign.
How do I specify the encoding when I use org.apache.commons.fileupload.FileItem?
Or is this the wrong way?
Thanks for your help in advance!