I have a data frame with following type:
col1|col2|col3|col4
xxxx|yyyy|zzzz|[1111],[2222]
I want my output to be following type:
col1|col2|col3|col4|col5
xxxx|yyyy|zzzz|1111|2222
My col4 is an array and I want to convert it to a separate column. What needs to be done?
I saw many answers with flatMap
, but they are increasing a row, I want just the tuple to be put in another column but in the same row
The following is my actual schema:
root
|-- PRIVATE_IP: string (nullable = true)
|-- PRIVATE_PORT: integer (nullable = true)
|-- DESTINATION_IP: string (nullable = true)
|-- DESTINATION_PORT: integer (nullable = true)
|-- collect_set(TIMESTAMP): array (nullable = true)
| |-- element: string (containsNull = true)
Also, can please someone help me with explanation on both dataframes and RDD's.