3

I have a pyspark dataframe as follows (this is just a simplified example, my actual dataframe has hundreds of columns):

col1,col2,......,col_with_fix_header
1,2,.......,3
4,5,.......,6
2,3,........,4

and I want to move col_with_fix_header in the start, so that the output comes as follows:

col_with_fix_header,col1,col2,............
3,1,2,..........
6,4,5,....
4,2,3,.......

I don't want to list all the columns in the solution.

ZygD
  • 22,092
  • 39
  • 79
  • 102
Harmeet
  • 193
  • 2
  • 9
  • 1
    Possible duplicate of [Python/pyspark data frame rearrange columns](https://stackoverflow.com/questions/42912156/python-pyspark-data-frame-rearrange-columns) – cronoik Nov 29 '19 at 14:16
  • But I don't want to list all the column names. In this example there are three columns but in my actual case there are 1O0s of columns and i want to just take the last column which has fa fixed header and move it in the start – Harmeet Nov 29 '19 at 22:02

1 Answers1

2

In case you don't want to list all columns of your dataframe, you can use the dataframe property columns. This property gives you a python list of column names and you can simply slice it:

df = spark.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)], ["id", "name", "age"])
  
df.select([df.columns[-1]] + df.columns[:-1]).show()

Output:

+---+---+-------+
|age| id|   name|
+---+---+-------+
| 34|  a|  Alice|
| 36|  b|    Bob|
| 30|  c|Charlie|
| 29|  d|  David|
| 32|  e| Esther|
| 36|  f|  Fanny|
| 60|  g|  Gabby|
+---+---+-------+
cronoik
  • 15,434
  • 3
  • 40
  • 78
  • 1
    The link for `columns` is invalid, I think you meant this link: https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.columns.html – wawawa Aug 17 '21 at 21:56
  • 1
    @Cecilia: Sadly, the databricks guys moved the pyspark documentation and destroyed all the links. Thanks for the hint! – cronoik Aug 22 '21 at 19:22