22

I have a dynamic list which is created based on value of n.

n = 3
drop_lst = ['a' + str(i) for i in range(n)]
df.drop(drop_lst)

But the above is not working.

Note:

My use case requires a dynamic list.

If I just do the below without list it works

df.drop('a0','a1','a2')

How do I make drop function work with list?

Spark 2.2 doesn't seem to have this capability. Is there a way to make it work without using select()?

ZygD
  • 22,092
  • 39
  • 79
  • 102
GeorgeOfTheRF
  • 8,244
  • 23
  • 57
  • 80

5 Answers5

83

You can use the * operator to pass the contents of your list as arguments to drop():

df.drop(*drop_lst)
zero323
  • 322,348
  • 103
  • 959
  • 935
mtoto
  • 23,919
  • 4
  • 58
  • 71
14

You can give column name as comma separated list e.g.

df.drop("col1","col11","col21")
Reinstate Monica
  • 2,767
  • 3
  • 31
  • 40
vaquar khan
  • 10,864
  • 5
  • 72
  • 96
4

This is how drop specified number of consecutive columns in scala:

val ll = dfwide.schema.names.slice(1,5)
dfwide.drop(ll:_*).show

slice take two parameters star index and end index.

Fox Fairy
  • 59
  • 3
-1

Use simple loop:

for c in drop_lst:
   df = df.drop(c)
Ray
  • 73
  • 4
  • 2
    [A code-only answer is not high quality](//meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers). While this code may be useful, you can improve it by saying why it works, how it works, when it should be used, and what its limitations are. Please [edit] your answer to include explanation and link to relevant documentation. – Stephen Ostermiller Oct 22 '21 at 00:21
-6

You can use drop(*cols) 2 ways .

  1. df.drop('age').collect()
  2. df.drop(df.age).collect()

Check the official documentation DataFrame.drop

Indrajit Swain
  • 1,505
  • 1
  • 15
  • 22