1

I am trying to get the first element of my address_line1 array. Here's my schema

df.printSchema()

root
 |-- address_line1: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- city: string (nullable = true)

And here's how my "address_line1" looks like in the dataframe:

df.select("address_line1").show()

+--------------------+
|       address_line1|
+--------------------+
|   [atmosphere e 20]|
|    [tennesse row 3]|
|                null|
+--------------------+

clearly, my address_line1 is an array and I want to keep it like that, the reason is as follows:

What I would like to do is to get elements of the array in the following manner.

address_line1[1]
address_line1[2]
address_line1[3]

even if they return nulls, becuase that will not be the case for all of the database.

what i have tried is:

def address_to_columns(df, col):
    return df.withColumn(
        "address_line1", 
        df.selectExpr("address_line1[1]")
    )

But this doesn't seem to work as this throws "cannot resolve" error.

What am i doing wrong? Any other way to get this done?

Pankaj Kaundal
  • 1,012
  • 3
  • 13
  • 25

0 Answers0