In Databricks, the table is created using the schema json definition.
schema json used to create table
{
"fields": [
{
"metadata": {},
"name": "username",
"nullable": true,
"type": "string"
},
{
"metadata": {},
"name": "department",
"nullable": true,
"type": "string"
},
{
"metadata": {},
"name": "income",
"nullable": true,
"type": "decimal(38,18)"
}
],
"type" :"struct"
}
Below code creates the table
...
# the schema json file is placed in a location and using it to create table
with open('/dbfs/FileStore/my-schema/{0}.json'.format(tbl_name), 'r') as f:
# data files
tbl_schema = T.StructType.fromJson(json.loads(f.read()))
tbl_df = spark.createDataFrame([], tbl_schema)
tbl_df.write.format("delta").save(tbl_path)
# create table.
spark.sql("CREATE TABLE {0} USING DELTA LOCATION '{1}'".format(tbl_name, tbl_path))
...
When i describe
the table, i see DecimalType(10,0) for income field.
I am using readstreams to read data from ORC file, where the Decimal(38,18) is used, able to printScehma() in the dataframe.
I am using Spark structured streams write stream which uses UPSERT (merge into), using foreachBatch().
similar to the python example in this link https://docs.azuredatabricks.net/_static/notebooks/merge-in-streaming.html
Issue:
whenever i run the table is not inserted with the data. There is no debug log message as well.
I suppose the issue might be due to the field income being DecimalType(10,0) in the table, and dataframe with DecimalType(38,18).
So i am trying to Alter the field, but not able to do it. I am using below command.
%sql ALTER TABLE mytable ALTER income TYPE decimal(38,18);
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: ALTER TABLE CHANGE COLUMN is not supported for changing column 'income' with type 'DecimalType(10,0) (nullable = true)' to 'income' with type 'DecimalType(38,18) (nullable = true)'
at com.databricks.sql.transaction.tahoe.DeltaErrors$.alterTableChangeColumnException(DeltaErrors.scala:478)
at com.databricks.sql.transaction.tahoe.commands.AlterTableChangeColumnDeltaCommand.verifyColumnChange(alterDeltaTableCommands.scala:412)
at com.databricks.sql.transaction.tahoe.commands.AlterTableChangeColumnDeltaCommand.$anonfun$run$15(alterDeltaTableCommands.scala:295)
at com.databricks.sql.transaction.tahoe.schema.SchemaUtils$.transform$1(SchemaUtils.scala:762)
at com.databricks.sql.transaction.tahoe.schema.SchemaUtils$.transformColumnsStructs(SchemaUtils.scala:781)
at com.databricks.sql.transaction.tahoe.commands.AlterTableChangeColumnDeltaCommand.$anonfun$run$14(alterDeltaTableCommands.scala:292)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperation$5(DeltaLogging.scala:122
if i remove the TYPE , i get below exception
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'decimal' expecting {<EOF>, ';'}(line 1, pos 46)
== SQL ==
ALTER TABLE ahm_db.message_raw ALTER income decimal(38,18)
----------------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:265)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:134)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:64)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:85)
at com.databricks.sql.parser.DatabricksSqlParser.$anonfun$parsePlan$1(DatabricksSqlParser.scala:67)
at com.databricks.sql.parser.DatabricksSqlParser.parse(DatabricksSqlParser.scala:87)