2

I have this DF for example

+----+------+------+-------+
| id | val1 | val2 | val3  |
+----+------+------+-------+
|  1 |   a  |   b  |   c   |
|  2 |   d  |   e  |   f   |
|  3 |   g  |   h  |   i   |

and I want the Columns val1, val2 and val3 transposed into Rows in this way:

+----+------+------+
| id |  key | value|
+----+------+------+
|  1 | val1 |   a  |
|  1 | val2 |   b  |
|  1 | val3 |   c  |
|  2 | val1 |   d  |
|  2 | val2 |   e  |
|  2 | val3 |   f  |
|  3 | val1 |   g  |
|  3 | val2 |   h  |
|  3 | val3 |   i  |

How can I achieve that?

Vzzarr
  • 4,600
  • 2
  • 43
  • 80

1 Answers1

3

By exploding a Map structure it is possible to get such transformation. So first transform the 3 columns into a Map (based on https://stackoverflow.com/a/41291156/4725074) for then exploding.

So given the following Spark DF:

import pyspark.sql.functions as F
from itertools import chain


spark = SparkSession \
    .builder \
    .getOrCreate()

d = [
    {'id': 1, 'val1': 'a', 'val2': 'b', 'val3': 'c'},
    {'id': 2, 'val1': 'd', 'val2': 'e', 'val3': 'f'},
    {'id': 3, 'val1': 'g', 'val2': 'h', 'val3': 'i'},
     ]
df = spark.createDataFrame(d)

the solution would be:

m = F.create_map(list(chain(*(
        (F.lit(c), F.col(c)) for c in df.columns if c in ["val1", "val2", "val3"]))))

df.withColumn('map', m)\
  .select('*', F.explode('map')).drop('map')

It's important to use a select() and not a witchColumn() as when exploding a Map column will generate 2 new columns, key and value.

The result will be the transposition of the selected columns into rows as reported in the example.

Vzzarr
  • 4,600
  • 2
  • 43
  • 80
  • I am trying to use above logic for my dataframe which has about 300 columns and I am passing a list I want here ["val1", "val2", "val3"], but above code doesnot give transpose my columns to rows, it ias as is – anonymous13 Feb 03 '22 at 17:19
  • it might be that 300 columns is a big number and this is an expensive operations... try with just a small sample and check if that works – Vzzarr Feb 03 '22 at 17:27