How to drop multiple column names given in a list from Spark DataFrame?

Question

I have a dynamic list which is created based on value of n.

n = 3
drop_lst = ['a' + str(i) for i in range(n)]
df.drop(drop_lst)

But the above is not working.

Note:

My use case requires a dynamic list.

If I just do the below without list it works

df.drop('a0','a1','a2')

How do I make drop function work with list?

Spark 2.2 doesn't seem to have this capability. Is there a way to make it work without using select()?

score 83 · Accepted Answer · edited Jan 06 '19 at 23:18

83

You can use the * operator to pass the contents of your list as arguments to drop():

df.drop(*drop_lst)

edited Jan 06 '19 at 23:18

zero323

answered Dec 15 '17 at 11:50

mtoto

3

Thanks! what does the * operator do? Whats its significance? – GeorgeOfTheRF Dec 16 '17 at 04:23
4

The star unpacks the contents of an iterator if you place it to its left, ie. it produces the individual elements of your list. – mtoto Dec 16 '17 at 08:07
1

This does not work for me, it gives: TypeError: drop() takes exactly 2 arguments (92 given). I might have an old version? – Thomas Mar 16 '18 at 12:56
4

To answer my own question: I just checked, and in my version (1.6.2), the list method described here does not work. – Thomas Mar 16 '18 at 13:03
Great solution!! Thanks – Vinod Sawant Apr 14 '20 at 04:11
1

The solution works in ***python*** but not in ***scala*** for scala see the answer by @fox ghost beneath – Lucas Roberts Jun 30 '20 at 21:21

score 14 · Answer 2 · edited Oct 17 '18 at 18:20

14

You can give column name as comma separated list e.g.

df.drop("col1","col11","col21")

edited Oct 17 '18 at 18:20

Reinstate Monica

answered Oct 17 '18 at 17:31

vaquar khan

score 4 · Answer 3 · answered Sep 04 '19 at 15:53

4

This is how drop specified number of consecutive columns in scala:

val ll = dfwide.schema.names.slice(1,5)
dfwide.drop(ll:_*).show

slice take two parameters star index and end index.

answered Sep 04 '19 at 15:53

Fox Fairy

score -1 · Answer 4 · answered Oct 21 '21 at 09:48

-1

Use simple loop:

for c in drop_lst:
   df = df.drop(c)

answered Oct 21 '21 at 09:48

Ray

2

[A code-only answer is not high quality](//meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers). While this code may be useful, you can improve it by saying why it works, how it works, when it should be used, and what its limitations are. Please [edit] your answer to include explanation and link to relevant documentation. – Stephen Ostermiller Oct 22 '21 at 00:21

score -6 · Answer 5 · answered Dec 15 '17 at 15:49

-6

You can use drop(*cols) 2 ways .

Check the official documentation DataFrame.drop

answered Dec 15 '17 at 15:49

Indrajit Swain

5 Answers5