Let's assume you have this dataset:
val df = spark.sparkContext.parallelize(Seq(
(202101, 5),
(202102, 7),
(202004, 8),
(202104, 6),
(202201, 0)
)).toDF("YearMonth", "value")
You can add to YearMonth
the value of 100
(since you are working with yearweeks) and rename your value
(I don't really know your case, but for the sake of clarity).
val df2 = df
.withColumn("YearMonth", col("YearMonth").plus(100))
.withColumnRenamed("value", "value2")
Then finally join back:
val df3 = df.join(df2, Seq("YearMonth"), "left")
Final result:
+---------+-----+------+
|YearMonth|value|value2|
+---------+-----+------+
|202004 |8 |null |
|202101 |5 |null |
|202102 |7 |null |
|202104 |6 |8 |
|202201 |0 |5 |
+---------+-----+------+
I hope this is useful, good luck!