I am getting an error "Column does not exist" when selecting an array of structs type column from a dataframe. This column is actually present in the dataframe and contains data. I can select it by its index. How can I select it by its name?
Data frame schema:
root
|-- accountId: string (nullable = true)
|-- documents: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- accountId: string (nullable = true)
| | |-- agreementId: string (nullable = true)
| | |-- createdBy: string (nullable = true)
| | |-- createdDate: string (nullable = true)
| | |-- documentType: string (nullable = true)
| | |-- externalId: string (nullable = true)
| | |-- externalSource: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- obligations: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- accountId: string (nullable = true)
| | | | |-- agreementId: string (nullable = true)
| | | | |-- createdBy: string (nullable = true)
| | | | |-- createdDate: string (nullable = true)
| | | | |-- description: string (nullable = true)
| | | | |-- documentId: string (nullable = true)
| | | | |-- dueDate: string (nullable = true)
| | | | |-- externalId: string (nullable = true)
| | | | |-- id: string (nullable = true)
| | | | |-- name: string (nullable = true)
| | | | |-- partyId: string (nullable = true)
| | | | |-- reminderPeriodUnit: long (nullable = true)
| | | | |-- resourceVersion: long (nullable = true)
| | | | |-- status: long (nullable = true)
| | | | |-- updatedBy: string (nullable = true)
| | | | |-- updatedDate: string (nullable = true)
| | |-- resourceVersion: long (nullable = true)
| | |-- updatedBy: string (nullable = true)
| | |-- updatedDate: string (nullable = true)
|-- effectiveDate: string (nullable = true)
|-- parties: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- agreementContacts: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- contactId: string (nullable = true)
| | | | |-- isPrimary: boolean (nullable = true)
| | | | |-- role: string (nullable = true)
| | |-- partyId: string (nullable = true)
| | |-- role: string (nullable = true)
|-- updatedBy: string (nullable = true)
|-- updatedDate: string (nullable = true)
I can select documents column by its index:
df.select(ef_json.columns[2]).show(truncate=False)
Result :
+--------------------+
| documents|
+--------------------+
|[{d73db7ba-5329-4...|
|[{d73db7ba-5329-4...|
|[{d73db7ba-5329-4...|
|[{d73db7ba-5329-4...|
|[{d73db7ba-5329-4...|
+--------------------+
When I select documents column by its name:
df.select("documents").show(truncate=False)
Result:
AnalysisException: Column 'documents' does not exist.
Please help :)