0

We are creating a CSV file, then writing to it using following piece of code (apache commons csv). We are using UTF-8 encoding while creating the file.

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter; 

File file = new File(fileName);
Writer fileWriter = new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8);
CSVPrinter printer = new CSVPrinter(fileWriter, CSVFormat.DEFAULT.withHeader(columnHeadersTranslatedArray));

Then we are using following code to write the file to servlet response output stream

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import javax.servlet.http.HttpServletResponse;
import org.springframework.util.FileCopyUtils;

HttpServletResponse response // received as function argument
response.setContentType("text/csv; charset=utf-8");
response.setHeader("Content-Disposition","attachment; filename=" + fileName);
response.setCharacterEncoding("utf-8");
response.setContentLength((int) file.length());

File file = new File(fileName);                 
InputStream inputStream = new BufferedInputStream(new FileInputStream(file));
FileCopyUtils.copy(inputStream, response.getOutputStream());

At the end we are downloading the file in the browser after attaching it to an anchor tag as a href using following code

a = document.createElement('a');
a.href = window.URL.createObjectURL(xmlhttp.response);
a.download = filename;
a.style.display = 'none';
document.body.appendChild(a);
a.click();

Now after downloading when we are opening the file using Notepad or Notepad++ it shows the Japanese character properly. But when we are double clicking .csv file it opens in MS Excel, then it shows garbled character in place of Japanese character. Most probably this time it uses default ANSI encoding in place of UTF-8 encoding. Please help me how that can be shown as Japanese character like Notepad. We are using the methods described in the following link to visualize the file with Japanese character in MS Excel

https://support.tibco.com/s/article/Excel-failed-to-display-Japanese-characters-when-opening-csv-file-saved-from-Statistica-13-2-with-UTF-8-encoding

  • Excel likes UTF-8 w/ BOM encoding on text files. Without BOM it assumes the default ANSI encoding for your Windows version. – Mark Tolonen Aug 11 '22 at 15:03
  • @MarkTolonen I shared my code changes. Please let me know what changes I need to do server or at UI side so that MS Excel understood it as BOM encoding. Also please let me know what BOM stands for. – Dipak Dutta Aug 11 '22 at 15:07
  • BOM stands for "[byte order mark](https://en.wikipedia.org/wiki/Byte_order_mark)" and is the Unicode character U+FEFF. It is used to indicate little- or big-endian UTF-16 (written bytes as FF FE or FE FF, respectively) or UTF-32 (00 00 FE FF or FF FE 00 00) but is also used by some applications as a UTF-8 signature, which doesn't care about byte order. I don't know the java solution, but write U+FEFF encoded as UTF-8 (EF BB BF) as the first 3 bytes of the CSV file. – Mark Tolonen Aug 11 '22 at 15:32
  • @MarkTolonen Thanks for clarifying. I am checking. You also please share this in your network if anyone can help. – Dipak Dutta Aug 11 '22 at 16:31

1 Answers1

0

Currently my understanding is above problem resolved after adding a BOM while writing to the file. I added the BOM (i.e. byte order mark) in following fashion after following statement.

Writer fileWriter = new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8);
fileWriter.write('\ufeff'); //wrote BOM character
CSVPrinter printer = new CSVPrinter(fileWriter, CSVFormat.DEFAULT.withHeader(columnHeadersTranslatedArray));