I'm setting up a snowpipe to load data from s3 bucket to snowflake schema. S3 contains files in NDJOSN format. One file can contain multiple records and I want to process all of them. Even if one record is broken.
To do so, I need to add on_error='continue'
option to pipe creation and use csv file format as stated in official snowflake docs here.
That way I receive raw strings of JSON that I need to parse to access data. And since snowpipes do not support nested selects the only way to do that is to parse it for each column individually.
Resulting in this copy statement:
copy into MY_TABLE
from (select parse_json($1):id, parse_json($1):name, parse_json($1):status
from @MY_STAGE_CONNECTED_TO_S3)
on_error = 'continue'
This code needs to parse json 3 times for each row.
I have a table with ~40 col's and for it, this query works ~5 times slower than a more straightforward solution that uses file format option to parse JSON but unfortunately does not support on_error=continue
option.
copy into HQO_DEVELOPMENT.PUBLIC.DIM_TENANT
from (select $1:id, $1:name, $1:status
from @HQO_DEVELOPMENT.PUBLIC.DIM_TENANT_STAGE_NN)
file_format = (type = 'json')
What was tried
- Using nested select like this: : is not supported
copy into HQO_DEVELOPMENT.PUBLIC.DIM_TENANT from
(select $1:id, $1:name, $1:status from (
select parse_json($1) from
@HQO_DEVELOPMENT.PUBLIC.DIM_TENANT_STAGE))
on_error = 'continue'
- Using JSON type on stage and omit on the pipe: won't help the issue
Is there a way to have the benefit of on_error=continue
and don't parse JSON for every column?