0

I have a pyspark data frame that every column appends the table name ie: Table.col1, Table.col2...

I would like to replace 'Table.' with '' (nothing) in every column in my dataframe.

How do I do this? Everything I have found deals with doing this to the values in the columns and not the column names themselves.

user3486773
  • 1,174
  • 3
  • 25
  • 50

2 Answers2

1

One option is to use toDF with replace :

DataFrame.toDF(*cols)
Returns a new DataFrame that with new specified column names

out = df.toDF(*[c.replace("Table.", "") for c in df.columns])

Output :

out.show()
+----+----+
|col1|col2|
+----+----+
| foo|   1|
| bar|   2|
+----+----+

Input used :

+----------+----------+
|Table.col1|Table.col2|
+----------+----------+
|       foo|         1|
|       bar|         2|
+----------+----------+
Timeless
  • 22,580
  • 4
  • 12
  • 30
0

To do it in PySpark:

from pyspark.sql.functions import col
new_columns = [col(column_name).alias(column_name.replace('Table.', '')) for column_name in df.columns]
df_new = df.select(new_columns)

Also, if anyone want to do the same in Pandas:

df.columns = df.columns.str.replace('Table.', '')
Sadman Sakib
  • 557
  • 3
  • 10