We receive a survey Web-hook data in Bigquery. The comment in local language is captured as unicode and we do have special character in that comment.
Example
- Comment in survey- "别老是晚点,现场补行李费太贵"
- Comment in Bigquery data- "\u522b\u8001\u662f\u665a\u70b9\uff0c\u73b0\u573a\u8865\u884c\u674e\u8d39\u592a\u8d35"
We found a solution for decode individual comment :-
CREATE TEMPORARY FUNCTION utf8convert(s STRING)
RETURNS STRING
LANGUAGE js AS """
return unescape( ( s ) );
""";
with sample AS (SELECT '\u522b\u8001\u662f\u665a' AS S)
SELECT utf8convert(s) from sample
When implement this code in comment field with thousand of comment and different languages its not working.
CREATE TEMPORARY FUNCTION utf8convert(s STRING)
RETURNS STRING
LANGUAGE js AS """
return unescape( ( s ) );
""";
SELECT Comment, utf8convert(Comment) as Convert
FROM `airasia-nps.nps_production.NPSDashboard_Webhook_Data1`
where Comment is not null
no error while running but the result is in Unicode does not change to local language. Result: local language in Unicode
I have try this code
CREATE TEMP FUNCTION DecodeUnicode(s STRING) AS ( IF(s NOT LIKE '%\\u%', s, (SELECT CODE_POINTS_TO_STRING(ARRAY_AGG(CAST(CONCAT('0x', x) AS INT64))) FROM UNNEST(SPLIT(s, '\\u')) AS x WHERE x != '')) ); SELECT original, DecodeUnicode(original) AS decoded FROM ( SELECT trim(r'$-\u6599\u91d1\u304c\u9ad8\u3059\u304e\uff01\uff01\uff01') AS original UNION ALL SELECT trim(r'abcd') );
shows error i think its because the comment start with special character?.