0

I am getting an error "Column does not exist" when selecting an array of structs type column from a dataframe. This column is actually present in the dataframe and contains data. I can select it by its index. How can I select it by its name?

Data frame schema:

root
 |-- accountId: string (nullable = true)
 |-- documents: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- accountId: string (nullable = true)
 |    |    |-- agreementId: string (nullable = true)
 |    |    |-- createdBy: string (nullable = true)
 |    |    |-- createdDate: string (nullable = true)
 |    |    |-- documentType: string (nullable = true)
 |    |    |-- externalId: string (nullable = true)
 |    |    |-- externalSource: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- obligations: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- accountId: string (nullable = true)
 |    |    |    |    |-- agreementId: string (nullable = true)
 |    |    |    |    |-- createdBy: string (nullable = true)
 |    |    |    |    |-- createdDate: string (nullable = true)
 |    |    |    |    |-- description: string (nullable = true)
 |    |    |    |    |-- documentId: string (nullable = true)
 |    |    |    |    |-- dueDate: string (nullable = true)
 |    |    |    |    |-- externalId: string (nullable = true)
 |    |    |    |    |-- id: string (nullable = true)
 |    |    |    |    |-- name: string (nullable = true)
 |    |    |    |    |-- partyId: string (nullable = true)
 |    |    |    |    |-- reminderPeriodUnit: long (nullable = true)
 |    |    |    |    |-- resourceVersion: long (nullable = true)
 |    |    |    |    |-- status: long (nullable = true)
 |    |    |    |    |-- updatedBy: string (nullable = true)
 |    |    |    |    |-- updatedDate: string (nullable = true)
 |    |    |-- resourceVersion: long (nullable = true)
 |    |    |-- updatedBy: string (nullable = true)
 |    |    |-- updatedDate: string (nullable = true)
 |-- effectiveDate: string (nullable = true)
 |-- parties: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- agreementContacts: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- contactId: string (nullable = true)
 |    |    |    |    |-- isPrimary: boolean (nullable = true)
 |    |    |    |    |-- role: string (nullable = true)
 |    |    |-- partyId: string (nullable = true)
 |    |    |-- role: string (nullable = true)
 |-- updatedBy: string (nullable = true)
 |-- updatedDate: string (nullable = true)

I can select documents column by its index:

df.select(ef_json.columns[2]).show(truncate=False)

Result :

+--------------------+
|           documents|
+--------------------+
|[{d73db7ba-5329-4...|
|[{d73db7ba-5329-4...|
|[{d73db7ba-5329-4...|
|[{d73db7ba-5329-4...|
|[{d73db7ba-5329-4...|
+--------------------+

When I select documents column by its name:

df.select("documents").show(truncate=False)

Result:

AnalysisException: Column 'documents' does not exist.

Please help :)

bda
  • 372
  • 1
  • 7
  • 22
  • 1
    what is exactly `ef_json.columns[2]`? try this : `ef_json.columns[2]=='documents'` – Steven Oct 18 '22 at 15:32
  • 1
    what is the difference between `df` and `ef_json` ? – Steven Oct 18 '22 at 15:36
  • "what is the difference between df and ef_json" - no difference (just a typo), I will update my post to df. – bda Oct 18 '22 at 15:45
  • "what is exactly ef_json.columns[2]" : index position of "documents". Plain and simple, I need to select the documents column by its name. Do you know how to do it? Can you explain why my code above does not work? – bda Oct 18 '22 at 15:48
  • what is the output of `ef_json.columns[2]=='documents'`? True or False ? – Steven Oct 18 '22 at 15:59
  • I have solved this mystery. documents column actually did not exist in some of the input data I was testing with, hence there were iterations when the data frame queried actually did not have the documents column. Apologies for confusion. – bda Oct 18 '22 at 18:41
  • then you can probably just close your question – Steven Oct 19 '22 at 12:34
  • yes, I probably can. – bda Oct 19 '22 at 14:04
  • There's a button at the bottom of the question called "close". Press that if you want to close the question as unsuitable for the site. However, as an author you can skip this and delete the question straight away. – Dharman Nov 02 '22 at 14:18

0 Answers0