Adding a new column in the first ordinal position in a pyspark dataframe

Question

I have a pyspark data frame like:

+--------+-------+-------+
| col1   | col2  | col3  |
+--------+-------+-------+
|  25    |  01   |     2 |
|  23    |  12   |     5 | 
|  11    |  22   |     8 |
+--------+-------+-------+

and I want to create new dataframe by adding a new column like this:

+--------------+-------+-------+-------+
| new_column   | col1  | col2  | col3  |
+--------------+-------+-------+-------+
|  0           |  01   |     2 |  0    |
|  0           |  12   |     5 |  0    |
|  0           |  22   |     8 |  0    |
+--------------+-------+-------+-------+

I know I can add column by:

df.withColumn("new_column", lit(0))

but it adds column at last like this:

+--------------+-------+-------+-------------+
| col1         | col1  | col2  | new_column  |
+--------------+-------+-------+-------------+
|  25          |  01   |     2 |  0          |
|  23          |  12   |     5 |  0          |
|  11          |  22   |     8 |  0          |
+--------------+-------+-------+-------------+

add using withColumn and select('new_column',other columns). — Suresh, Nov 16 '18 at 12:46

score 9 · Answer 1 · answered Nov 16 '18 at 15:37

You can always reorder the columns in a spark DataFrame using select, as shown in this post.

In this case, you can also achieve the desired output in one step using select and alias as follows:

df = df.select(lit(0).alias("new_column"), "*")

Which is logically equivalent to the following SQL code:

SELECT 0 AS new_column, * FROM df

score 8 · Answer 2 · answered Nov 16 '18 at 13:45

8

you can reorder columns using select.

df = df.select('new_column','col1','col2','col3')
df.show()

answered Nov 16 '18 at 13:45

Terry

2,761
2
14
28

score 0 · Answer 3 · answered Nov 16 '18 at 15:13

0

df.select(['new_column', 'col1', 'col2', 'col3'])

answered Nov 16 '18 at 15:13

Kris

1,618
1
13
13

score -3 · Answer 4 · answered Jun 23 '19 at 12:58

-3

You can use the INSERT function

    df.insert(0, 'new_column', [data,,,])

thanks

answered Jun 23 '19 at 12:58

Daniel Scott

1

Adding a new column in the first ordinal position in a pyspark dataframe

4 Answers4