1

This is the table I want to transpose

1

I created a list of the distinct values at DESC_INFO using this: columnsToPivot = list(dict.fromkeys(df.filter(F.col("DESC_INFO") != '').rdd.map(lambda x: (x.DESC_INFO, x.RAW_INFO)).collect()))

And then I tried to map the RAW_INFO values into the matching columns with this:

for key in columnsToPivot:
  if key[1] != '':
    df = df.withColumn(key[0], F.lit(key[1]))

It happens that I just wrote all rows with the same value when I want to fill the RAW_INFO where the table matcher the values mapped with the same 'PROCESS', 'SUBPROCESS' AND 'LAYER'.

This is the map I expect at the end.

2

The blue lines mean the transpose I have already achieved. The red lines mean the data I need to fill matching the condition shadowed in yellow.

vimuth
  • 5,064
  • 33
  • 79
  • 116

2 Answers2

0

Try this:

from pyspark.sql.functions import *
df_new=df.filter(df['Desc_info']!='').groupBy('Layer','Process','Subprocess').pivot('Desc_Info').agg(first('Raw Info'))
Blue Robin
  • 847
  • 2
  • 11
  • 31
Ankit Tyagi
  • 175
  • 2
  • 17
  • Your code is working and it could be helpful, but, its deleting data while it does the pivot. The table has some extra data that I want to transpose later as well, ill insert an screenshot below. I want to transpose the rows about columns ('COLUMN', COLUMN_DATA' and 'COLUMN_DATA_VALUE') and when I execute the command above, it gets deleted. – José Bastos Feb 28 '23 at 12:29
0

This shows the extra data that I have hidden on the first 2 screenshots. Thats becouse of this columns that I want to repeat the data from DESC_INFO and RAW_INFO

  • can't these extra columns be added in the group by clause, groupBy('Layer','Process','Subprocess','COLUMN', COLUMN_DATA','COLUMN_DATA_VALUE') – Ankit Tyagi Mar 10 '23 at 06:31