We are using AWS Athena tables to store order and product information. We have a json string in a column called line-items in orders table. It is not a valid json and we are converting into valid json, then parse it to show different products in line-items as a separate row in the final resultset.
In certain cases, the json parsing fails as we found out that the json document stored in the line-items column. Due to few records, entire resultset throws an error. Please suggest an alternate solution so that json parsing of invalid json is ignored and it returns null for them, whereas also return the successful rows.
Please find the query below:
It fails when we use function json_parse()
with orders_info AS (
select id as order_id, created_at, substring(created_at, 1, 10) as order_date,
customer_id, customer_email, customer_phone, billing_address_country, line_items, total_price_set_shop_money_amount,
row_number() over(partition by id order by etl_run_date asc) AS rn
from orders where cast(substring(created_at, 1, 10) as date) = CURRENT_DATE - INTERVAL '2' DAY
and id in ('134', '4545')
),
orders_dataset as (
select *, replace(replace(replace(replace(line_items, 'None', '''none'''), 'True', 'true'), 'False', 'false'), '''', '"') as line_items_json
from orders_info where rn = 1
),
line_items_dataset as (
select od.*, json_extract_scalar(m, '$.id') product_id,
json_extract_scalar(m, '$.variant_id') variant_id,
json_extract_scalar(m, '$.price_set.shop_money.amount') price_set_shop_money_amount,
json_extract_scalar(m, '$.price_set.shop_money.currency_code') price_set_shop_money_currency_code
from orders_dataset od,
unnest(cast(json_parse(line_items_json) as array(json))) as t(m)
)
select * from line_items_dataset