0

I tried to convert a file from Big5 to UTF8 using the iconv command. I am getting the error : illegal input sequence at position 18876

iconv -f BIG5 -t UTF8 doc_full_list.csv > doc_full_list.csv.out

When I used Apache Nifi 'ConvertCharacterSet' processor, it could successfully convert the same file.

It basically fixes the errors as below:

final CharsetDecoder decoder = inputCharset.newDecoder();
        decoder.onMalformedInput(CodingErrorAction.REPLACE);
        decoder.onUnmappableCharacter(CodingErrorAction.REPLACE);
        decoder.replaceWith("?");

Would it be possible to achieve the conversion from unix command line without using any tool ?

ForeverLearner
  • 1,901
  • 2
  • 28
  • 51
  • 1
    You are aware that it will replace untranslatable characters with a '?' You can do the same with the iconv command by using -f with a translit file – Chaffelson Feb 22 '18 at 11:01
  • Thanks @Chaffelson. I am now using the '-c' option to suppress the warnings. I am able to avoid moving the data through nifi in the process. I will keep the group posted. – ForeverLearner Mar 20 '18 at 08:01
  • iconv -c -f BIG5 -t UTF-8 doc_full_list.csv -o doc_full_list.csv.test – ForeverLearner Mar 20 '18 at 08:02

0 Answers0