4

Update: The original CSV was created in Excel; when I copied the data in to a Google Spreadsheet and downloaded a CSV from Drive, it works fine. I'm guessing there's an encoding issue w/ the Excel CSV? Is there any way to work around this w/ Excel or do we need to tell our clients to use Google docs?

I've got a CSV w/ non-roman characters (my example is in French, but we support entirely non-roman languages such as Arabic and Thai as well) that I'm reading via ColdFusion's cffile. The problem is the output from the read is converting all the accented characters into a weird ? symbol (�). There was originally no charset specified on the cffile, so I tried adding utf-8 (no change) and utf-16 (everything is converted to sort-of Chinese?).

Anyone know how I can get this data out of the CSV without losing/messing up the characters?

CSV Example:

Smith,Joan,joan.smith@test.com,Hôpital Jésus

Original cffile:

<cffile action="read" file="#expandedFilePath#" variable="strCSV">

cffile w/ charset added:

<cffile action="read" file="#expandedFilePath#" variable="strCSV" charset="utf-8">

cfdump of strCSV (no charset/utf-8 charset):

Smith,Joan,joan.smith@test.com,H�pital J�sus

cfdump of strCSV (utf-16 charset):

卭楴栬䩯慮ⱪ潡渮獭楴桀瑥獴⹣潭ⱈ楴慬⁊畳ഊ

shimmoril
  • 682
  • 1
  • 11
  • 22
  • What version of CF? `` didn't used to support character encodings so well (even like CF9), but I thought it had been sorted out. – Adam Cameron Jul 08 '14 at 21:52
  • You could always try cfhttp instead of cffile. If nothing else, you can read your file into a query object which may be easier to work with. – Dan Bracuk Jul 08 '14 at 23:39
  • @AdamCameron - we're using CF10. Based on my tests w/ Google Drive spreadsheets, I'm inclined to believe it's an issue w/ Excel, rather than ColdFusion. – shimmoril Jul 09 '14 at 12:45
  • 1
    what is your JVM `file.encoding` property? Try this: `chset = CreateObject("java","java.lang.System").getProperty("file.encoding")` – Sergey Babrenok Jul 09 '14 at 14:59

1 Answers1

1

Excel, like most Windows programs, uses the CP-1252 encoding (not UTF-8; and this is important: ALSO NOT ISO-8859-1 as recognised by most encoding guessers). Did you already try to do:

<cffile action="read" file="#expandedFilePath#" 
      variable="strCSV" 
      charset="windows-1252" />

If this works, can you rely on your inputs to always be default Windows files?

Leigh
  • 28,765
  • 10
  • 55
  • 103
wiesion
  • 2,349
  • 12
  • 21