I am trying to get the first element of my address_line1
array. Here's my schema
df.printSchema()
root
|-- address_line1: array (nullable = true)
| |-- element: string (containsNull = true)
|-- city: string (nullable = true)
And here's how my "address_line1"
looks like in the dataframe:
df.select("address_line1").show()
+--------------------+
| address_line1|
+--------------------+
| [atmosphere e 20]|
| [tennesse row 3]|
| null|
+--------------------+
clearly, my address_line1
is an array and I want to keep it like that, the reason is as follows:
What I would like to do is to get elements of the array in the following manner.
address_line1[1]
address_line1[2]
address_line1[3]
even if they return nulls, becuase that will not be the case for all of the database.
what i have tried is:
def address_to_columns(df, col):
return df.withColumn(
"address_line1",
df.selectExpr("address_line1[1]")
)
But this doesn't seem to work as this throws "cannot resolve" error.
What am i doing wrong? Any other way to get this done?