i have a a lot of json files in s3 which are updated frequently. Basically i am doing CRUD operations in a datalake. Because apache iceberg can handle item-level manipulations, i would like to migrate my data to use apache iceberg as table format.
my data is in json files, but i have crawled the data and created a glue-table. The glue table was basically created automatically after crawling and the schema with all the data types was automatically detected.
i want to migrate this table to a table with iceberg format. Therefore i created a iceberg table with the same schema i read from the existing crawled glue table:
CREATE TABLE
icebergtest
(
`name` string,
`address` string,
`customer` boolean,
`features` array<struct<featurename:string,featureId:string,featureAge:double,featureVersion:string>>,
)
LOCATION 's3://<s3-bucket-name>/<blablapath>/'
TBLPROPERTIES ( 'table_type' ='ICEBERG' );
as you can see, i have some attributes in my json files an features is an array with json objects. I just copy pasted the data types from my exising glue table. creating the table was successful, but filling the iceberg table with the data from the glue table fails:
INSERT INTO "icebergtest"
SELECT * FROM "customer_json_table";
ERRORs : SYNTAX_ERROR: Insert query has mismatched column types: Table: [varchar, varchar, boolean, array(row(featurename varchar, featureId varchar, featureAge double, featureVersion varchar)), ...
for me it seems like it i am trying to insert varchar to a string datafield. But my glue tables has also a string as data type configure.. i dont understand where suddenly varchar is coming from and how i can fix that problem.