0

Hey guys I am brand new to Spark/Scala and I wanted to rename a few nested JSON fields because when I am doing a lateral view it fails because there are multiple JSON fields with the same name.

I want to rename the columns EffDate and ExpDate in EmployeeAddr and EmployeePhone.

I've tried withColumnRenamed and withColumn functions but both aren't working for me for some reason.


Code to load into dataframe:
val Employee= spark.read.format(Employeefile_type).option("header", "true").option("inferSchema","true").load(file_loction)



root
 |-- BirthDate: string (nullable = true)
 |-- EmployeeId: string (nullable = true)
 |-- EmployeeAddr: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- AddrTypeName: string (nullable = true)
 |    |    |-- City: string (nullable = true)
 |    |    |-- CtryCode: string (nullable = true)
 |    |    |-- EffDate: string (nullable = true)
 |    |    |-- ExpDate: string (nullable = true)
 |    |    |-- PostalCode: string (nullable = true)
 |    |    |-- Province: string (nullable = true)
 |    |    |-- Street1: string (nullable = true)
 |    |    |-- Street2: string (nullable = true)
 |-- EmployeeEmail: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- CrewEmailAddr: string (nullable = true)
 |    |    |-- EmailType: string (nullable = true)
 |-- EmployeeEmerContact: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Addr: string (nullable = true)
 |    |    |-- FirstName: string (nullable = true)
 |    |    |-- LastName: string (nullable = true)
 |    |    |-- PrimaryPhone: string (nullable = true)
 |    |    |-- Relatnshp: string (nullable = true)
 |    |    |-- Title: string (nullable = true)
 |-- EmployeeEmplymntStatus: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- EmplymntStatusCode: string (nullable = true)
 |    |    |-- EmplymntStatusReason: string (nullable = true)
 |    |    |-- EndDate: string (nullable = true)
 |    |    |-- StartDate: string (nullable = true)
 |-- EmployeePhone: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- EmployeePhoneNumber: string (nullable = true)
 |    |    |-- EffDate: string (nullable = true)
 |    |    |-- ExpDate: string (nullable = true)
 |    |    |-- PhoneType: string (nullable = true)
Kamran
  • 147
  • 4
  • 14

1 Answers1

0

You can apply the solution described here:

How to rename fields in an DataFrame corresponding to nested JSON

It does the following, replace a DataFrame schema (re-creating DataFrame with a new schema.

OBarros
  • 132
  • 1
  • 9