Pyspark - Transpose distinct row values to a column header where ill insert values from the same row I transposed but different column

Question

This is the table I want to transpose

I created a list of the distinct values at DESC_INFO using this: columnsToPivot = list(dict.fromkeys(df.filter(F.col("DESC_INFO") != '').rdd.map(lambda x: (x.DESC_INFO, x.RAW_INFO)).collect()))

And then I tried to map the RAW_INFO values into the matching columns with this:

for key in columnsToPivot:
  if key[1] != '':
    df = df.withColumn(key[0], F.lit(key[1]))

It happens that I just wrote all rows with the same value when I want to fill the RAW_INFO where the table matcher the values mapped with the same 'PROCESS', 'SUBPROCESS' AND 'LAYER'.

This is the map I expect at the end.

The blue lines mean the transpose I have already achieved. The red lines mean the data I need to fill matching the condition shadowed in yellow.

score 0 · Answer 1 · edited Feb 28 '23 at 21:17

0

Try this:

from pyspark.sql.functions import *
df_new=df.filter(df['Desc_info']!='').groupBy('Layer','Process','Subprocess').pivot('Desc_Info').agg(first('Raw Info'))

edited Feb 28 '23 at 21:17

Blue Robin

847
2
11
31

answered Feb 28 '23 at 05:38

Ankit Tyagi

175
2
17

Your code is working and it could be helpful, but, its deleting data while it does the pivot. The table has some extra data that I want to transpose later as well, ill insert an screenshot below. I want to transpose the rows about columns ('COLUMN', COLUMN_DATA' and 'COLUMN_DATA_VALUE') and when I execute the command above, it gets deleted. – José Bastos Feb 28 '23 at 12:29

score 0 · Answer 2 · answered Feb 28 '23 at 12:35

0

Thats becouse of this columns that I want to repeat the data from DESC_INFO and RAW_INFO

answered Feb 28 '23 at 12:35

José Bastos

11
2

can't these extra columns be added in the group by clause, groupBy('Layer','Process','Subprocess','COLUMN', COLUMN_DATA','COLUMN_DATA_VALUE') – Ankit Tyagi Mar 10 '23 at 06:31

Pyspark - Transpose distinct row values to a column header where ill insert values from the same row I transposed but different column

2 Answers2