Python's rfind
may be useful.
Example dataframes:
from pyspark.sql import functions as F
df = spark.createDataFrame(
[('2022-01-01', 1234, 'some_type_1', 2),
('2022-01-01', 1234, 'some_type_2', 3)],
['data', 'id', 'type', 'value'])
df1 = df.groupBy(['data', 'id']).pivot('type').agg(F.sum('value').alias("Values"), F.count('value').alias("Quantity"))
df1.show()
# +----------+----+------------------+--------------------+------------------+--------------------+
# | data| id|some_type_1_Values|some_type_1_Quantity|some_type_2_Values|some_type_2_Quantity|
# +----------+----+------------------+--------------------+------------------+--------------------+
# |2022-01-01|1234| 2| 1| 3| 1|
# +----------+----+------------------+--------------------+------------------+--------------------+
Script for renaming:
df1 = df1.select(
*['data', 'id'],
*[F.col(c).alias(f"{c[c.rfind('_')+1:]} {c[:c.rfind('_')]}") for c in df1.columns if c not in ['data', 'id']]
)
df1.show()
# +----------+----+------------------+--------------------+------------------+--------------------+
# | data| id|Values some_type_1|Quantity some_type_1|Values some_type_2|Quantity some_type_2|
# +----------+----+------------------+--------------------+------------------+--------------------+
# |2022-01-01|1234| 2| 1| 3| 1|
# +----------+----+------------------+--------------------+------------------+--------------------+
toDF
is also possible and it's less verbose, but it can be more prone to error in some cases.
df1 = df1.toDF(
*['data', 'id'],
*[f"{c[c.rfind('_')+1:]} {c[:c.rfind('_')]}" for c in df1.columns if c not in ['data', 'id']]
)
df1.show()
# +----------+----+------------------+--------------------+------------------+--------------------+
# | data| id|Values some_type_1|Quantity some_type_1|Values some_type_2|Quantity some_type_2|
# +----------+----+------------------+--------------------+------------------+--------------------+
# |2022-01-01|1234| 2| 1| 3| 1|
# +----------+----+------------------+--------------------+------------------+--------------------+