I've got a DataFrame like this:
from pyspark.sql import SparkSession
from pyspark import Row
spark = SparkSession.builder \
.appName('DataFrame') \
.master('local[*]') \
.getOrCreate()
df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'),
Row(a=2, b='', c=['0', '1'], d='bar'),
Row(a=3, b='', c=['0', '1'], d='foo')])
| a| b| c| d|
+---+---+------+---+
| 1| |[0, 1]|foo|
| 2| |[0, 1]|bar|
| 3| |[0, 1]|foo|
+---+---+------+---+
I would like to create column "e"
with first element of "c"
column and "f"
column with second element of "c"
column", to look like this:
|a |b |c |d |e |f |
+---+---+------+---+---+---+
|1 | |[0, 1]|foo|0 |1 |
|2 | |[0, 1]|bar|0 |1 |
|3 | |[0, 1]|foo|0 |1 |
+---+---+------+---+---+---+