How to move a specific column of a pyspark dataframe in the start of the dataframe

Question

I have a pyspark dataframe as follows (this is just a simplified example, my actual dataframe has hundreds of columns):

col1,col2,......,col_with_fix_header
1,2,.......,3
4,5,.......,6
2,3,........,4

and I want to move col_with_fix_header in the start, so that the output comes as follows:

col_with_fix_header,col1,col2,............
3,1,2,..........
6,4,5,....
4,2,3,.......

I don't want to list all the columns in the solution.

Possible duplicate of [Python/pyspark data frame rearrange columns](https://stackoverflow.com/questions/42912156/python-pyspark-data-frame-rearrange-columns) — cronoik, Nov 29 '19 at 14:16
But I don't want to list all the column names. In this example there are three columns but in my actual case there are 1O0s of columns and i want to just take the last column which has fa fixed header and move it in the start — Harmeet, Nov 29 '19 at 22:02

cronoik · Accepted Answer · 2021-08-22T19:20:52.250

2

In case you don't want to list all columns of your dataframe, you can use the dataframe property columns. This property gives you a python list of column names and you can simply slice it:

df = spark.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)], ["id", "name", "age"])
  
df.select([df.columns[-1]] + df.columns[:-1]).show()

Output:

+---+---+-------+
|age| id|   name|
+---+---+-------+
| 34|  a|  Alice|
| 36|  b|    Bob|
| 30|  c|Charlie|
| 29|  d|  David|
| 32|  e| Esther|
| 36|  f|  Fanny|
| 60|  g|  Gabby|
+---+---+-------+

edited Aug 22 '21 at 19:20

answered Nov 29 '19 at 22:24

cronoik

15,434
3
40
78

1

The link for `columns` is invalid, I think you meant this link: https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.columns.html – wawawa Aug 17 '21 at 21:56
1

@Cecilia: Sadly, the databricks guys moved the pyspark documentation and destroyed all the links. Thanks for the hint! – cronoik Aug 22 '21 at 19:22

How to move a specific column of a pyspark dataframe in the start of the dataframe

1 Answers1