I'm extracting German addresses from a CSV file using Java CsvReader. Some of the street names have special German characters, also called "Umlaute", like ö,ä,ü,... (Example: Sonnige Höhe). Here is the code I use:
try {
String addressDataCsvFilename = "Tannis_Export.csv";
CsvReader addressDataCsvFile = new CsvReader(addressDataCsvFilename, ',', Charset.forName("UTF-8") );
/*
String[] headers = {
"PLZ", // C
"Strasse", // E
"Hausnummer", // F
};
*/
// get headers
addressDataCsvFile.readHeaders();
while (addressDataCsvFile.readRecord()) {
// workaround for issue with CSVReader not finding header in first column
// String partNumber = priceListCsvFile.get("PART NUMBER");
String postleitzahl = addressDataCsvFile.get("PLZ");
String strassenName = addressDataCsvFile.get("Strasse");
String hausNummer = addressDataCsvFile.get("Hausnummer");
It turns out that even though I'm specifying UTF-8 as charset, CsvReader.readRecord()
doesn't read the special German characters correctly, so "Sonnige Höhe" becomes "Sonnige H�he". How to prevent that?