0

I want to create table from CSV file, with the standard column data types like datetime, varchar, int etc and columns can accommodate upto 30000 character length and also able to handle clob columns.

I have a CSV files which I am converting into parquet format so I can create delta tables in pyspark. enter image description here As in above image you can see the table (table name: ICE) is created in Synapse lake database under "default" schema. All the columns has created with Varchar(8000) limit, I want to create custom column datatype instead of everything as Varchar(8000). Also there are few columns which are greater than 300000 character lengths those I believe getting trimmed with Varchar (8000) limitation.

I am not sure if this the correct method for creating table from CSV files. Your recommendation is appreciated. enter image description here

1 Answers1

0

The Delta Transaction Log Protocol has a section on column mapping here. The relevant part of the protocol is:

Delta can use column mapping to avoid any column naming restrictions, and to support the renaming and dropping of columns without having to rewrite all the data.

There isn't a limit on the column name length according to the spec, but there is a practical limit from the metastore being used. For example, Databricks restricts the length of column names to 4000 characters as mentioned in this knowledge base page. It appears your Synapse instance restricts the length to 8000. For longer column names, e.g. in highly nested columns, you'll have to either reduce the length of the column names or configure an external metastore for use with your Delta Tables that's configured to handle longer column names.

Jim Hibbard
  • 205
  • 1
  • 6