How to fix JSON data Special character issue in snowflake (UTF-8 conversion error - snowflake table loaded with �)?

Question

JSON file from S3 to snowflake failed. Here S3_STG_AREA_JSON is the staging area & STG_TABLE_NAME_JSON is staging table.

Statement executed :

COPY INTO STG_TABLE_NAME_JSON FROM @S3_STG_AREA_JSON FILE_FORMAT=(TYPE='json' STRIP_OUTER_ARRAY=true)

Error:

**Code: 100183 State: P0000 Message: Error parsing JSON: missing first byte in UTF-8 sequence**

I have tried

COPY INTO STG_TABLE_NAME_JSON FROM @S3_STG_AREA_JSON FILE_FORMAT=(TYPE='json' STRIP_OUTER_ARRAY=true SKIP_BYTE_ORDER_MARK = TRUE)

Got same issue as above.

Then tried

COPY INTO STG_TABLE_NAME_JSON FROM @S3_STG_AREA_JSON FILE_FORMAT=(TYPE='json' STRIP_OUTER_ARRAY=true IGNORE_UTF8_ERRORS = TRUE)

Load is completed. But now, the snowflake table has � (U+FFFD)

How to fix this issue ?

U+FFFD is a Unicode replacement character. If you ignore an error it will use this in its place. It appears that the original JSON has an invalid Unicode character in it. If you want to remove the replacement character, you can use the replace function to replace it with a blank string. — Greg Pavlik, Jul 29 '20 at 14:46
Looks like that is the option, I left with and it's going to work. Will change the process, test and update. — Anil Kumar, Jul 29 '20 at 18:49
We have Integration software in the middle and they didn't use UTF-8 format. we dont have this issue now as they started using UTF-8 encoding. — Anil Kumar, Aug 04 '20 at 03:12

0 Answers0