1

i have a a lot of json files in s3 which are updated frequently. Basically i am doing CRUD operations in a datalake. Because apache iceberg can handle item-level manipulations, i would like to migrate my data to use apache iceberg as table format.

my data is in json files, but i have crawled the data and created a glue-table. The glue table was basically created automatically after crawling and the schema with all the data types was automatically detected.

i want to migrate this table to a table with iceberg format. Therefore i created a iceberg table with the same schema i read from the existing crawled glue table:

  CREATE TABLE
      icebergtest 
      (
        `name` string,
        `address` string,
        `customer` boolean,   
        `features` array<struct<featurename:string,featureId:string,featureAge:double,featureVersion:string>>,
      )
  LOCATION 's3://<s3-bucket-name>/<blablapath>/'
  TBLPROPERTIES ( 'table_type' ='ICEBERG'  );

as you can see, i have some attributes in my json files an features is an array with json objects. I just copy pasted the data types from my exising glue table. creating the table was successful, but filling the iceberg table with the data from the glue table fails:

INSERT INTO "icebergtest"
SELECT * FROM "customer_json_table";

ERRORs : SYNTAX_ERROR: Insert query has mismatched column types: Table: [varchar, varchar, boolean, array(row(featurename varchar, featureId varchar, featureAge double, featureVersion varchar)), ...

for me it seems like it i am trying to insert varchar to a string datafield. But my glue tables has also a string as data type configure.. i dont understand where suddenly varchar is coming from and how i can fix that problem.

Khan
  • 1,418
  • 1
  • 25
  • 49
  • You should try specifying the columns by name. Like so: `INSERT INTO table2 (column1, column2, column3, ...)` `SELECT column1, column2, column3, ...` `FROM table1` `WHERE condition;` – roizaig Jan 25 '23 at 09:15

0 Answers0