Databricks Error in SQL statement: AnalysisException: Cannot up cast documents from array

Question

I am getting the following error when running a SQL query and do not understand what it means or what can be done to resolve it. Any recommendations?

View DDL:

    CREATE VIEW myschema.table (
  accountId,
  agreementType,
  capture_file_name,
  capture_file_path,
  createdDate,
  currency,
  de_agentid,
  de_applicationshardid,
  de_datacontenttype,
  de_eventapplicationtime,
  de_eventmode,
  de_eventpublishtime,
  de_eventsequenceid,
  de_id,
  de_partitionkey,
  de_source,
  documents,
  effectiveDate,
  eh_EnqueuedTimeUtc,
  eh_Offset,
  eh_SequenceNumber,
  eh_SystemProperties_x_opt_enqueued_time,
  eh_SystemProperties_x_opt_kafka_key,
  endDate,
  expirationDate,
  externalId,
  externalSource,
  id,
  isInWorkflow,
  isSigned,
  name,
  notice,
  noticeDate,
  noticePeriod,
  parties,
  reconciled_file_name_w_path,
  requestor,
  resourceVersion,
  status,
  terminateForConvenience,
  updatedDate,
  value,
  de_action,
  de_eventapplication_year,
  de_eventapplication_month,
  de_eventapplication_day,
  de_eventapplication_hour,
  de_eventapplication_minute)
TBLPROPERTIES (
  'transient_lastDdlTime' = '1664473495')
AS select * from parquet.`/mnt/store/de_entitytype=Agreement`

SQL query:

    select de_id from myschema.table;

Error:

Error in SQL statement: AnalysisException: Cannot up cast documents from array<struct<accountId:string,agreementId:string,createdBy:string,createdDate:string,id:string,obligations:array<string>,resourceVersion:bigint,updatedBy:string,updatedDate:string>> to array<string>.
    The type path of the target object is: 
    You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object

Sample data row (in JSON format). Documents attribute is the complex data type (array) with varying levels of hierarchy. Converting it in the code is not an option.

{
  "eh_SequenceNumber": "153582",
  "eh_Offset": "25770201768",
  "eh_EnqueuedTimeUtc": "10/7/2022 1:05:07 PM",
  "eh_SystemProperties_x_opt_kafka_key": "08daa864-8ed7-4401-84e9-22ba9157c0ad",
  "eh_SystemProperties_x_opt_enqueued_time": "1665147907333",
  "de_id": "7432aa9e-4a65-446e-9bd0-b253dfa0c150",
  "de_eventpublishtime": "10/07/2022 01:05:07.333095Z",
  "de_datacontenttype": "JSON",
  "de_partitionkey": "08daa864-8ed7-4401-84e9-22ba9157c0ad",
  "de_eventsequenceid": "19856",
  "de_eventapplicationtime": "10/07/2022 01:05:07.290264Z",
  "de_eventmode": "Original",
  "de_applicationshardid": "00000000-0000-0000-0000-000000000000",
  "de_agentid": "00000000-0000-0000-0000-000000000000",
  "de_entitytype": "Agreement",
  "de_action": "Post",
  "de_source": "adm/Post/Agreement",
  "reconciled_file_name_w_path": "dbfs:/mnt/test.avro",
  "capture_file_name": "test.avro",
  "capture_file_path": "dbfs:/mnt/test/",
  "id": "08daa864-8ed7-4401-84e9-22ba9157c0ad",
  "accountId": "b111d426-1206-4ac7-a381-4651d277b78c",
  "createdDate": "2022-10-07T13:05:07.290264Z",
  "updatedDate": "2022-10-07T13:05:07.290264Z",
  "documents": [
    {
      "id": "7c3aea45-3d46-ed11-9c69-3863bb335c17",
      "accountId": "b111d426-1206-4ac7-a381-4651d277b78c",
      "obligations": [],
      "createdDate": "2022-10-07T13:05:00.946504Z",
      "updatedDate": "2022-10-07T13:05:07.290264Z",
      "name": "test.doc",
      "externalId": "7c3aea45-3d46-ed11-9c69-3863bb335c17",
      "externalSource": "test",
      "agreementId": "08daa864-8ed7-4401-84e9-22ba9157c0ad",
      "resourceVersion": 1
    }
  ],
  "name": "test.doc",
  "externalId": "7c3aea45-3d46-ed11-9c69-3863bb335c17",
  "externalSource": "test",
  "parties": [],
  "status": 0,
  "resourceVersion": 1
}

Any recommendations on how to resolve this error would be much appreciated :)

The error indicates that the Documents array cannot be upcasted as it is an array. Why not try to read it as pyspark dataframe and then to Databricks table rather than getting nested file to table? — Rakesh Govindula, Oct 14 '22 at 16:49
I have a partitioned parquet data set generated from a data frame. As new data arrives, new parquet files get added to that data set every 5 mins. Some of its columns contain arrays of structs. I need to be able to query that partitioned parquet data set while new parquet files get added. Not sure how your solution helps, but I need to stick with parquet files and spark view on top of them, not the delta table, if this is what you meant, @RakeshGovindula — bda, Oct 14 '22 at 20:54
Yeah, got your point. In the above the number of columns in view and the number of columns in the sample are not same. Could you please check on that? — Rakesh Govindula, Oct 15 '22 at 02:48
@RakeshGovindula "Could you please check on that?" Every time a new file is added, I have the view re-created to handle any new columns, it works. So no need to check on that as some files may not have all the columns, which is expected. The problem occurs with the complex data type columns ... Thanks for your time. — bda, Oct 16 '22 at 21:00

Databricks Error in SQL statement: AnalysisException: Cannot up cast documents from array

0 Answers0